Every leadership role is a challenge in itself, but data science management is challenging in its unique way. This challenge is mainly because it is a newer discipline, and there isn't much to refer to on how to manage the right data science team.
While there isn't an outright data team management standard, there are a few proven strategies you can follow to lead your data science team effectively. Whether you're just starting out as a data science leader or already managing a data science team today, here are four things you should know:
1. Equip the team with more than data scientists
This point is the first on the list because companies who act otherwise will burn out their data scientists and lose them. Some organizations tend to hold on to the thought that data scientists can and should build models and keep everything running in production and after. No one should be expected to do it all. People who can do it all exist, but they can be hard to find, afford, and keep. Not every problem requires the specialized skills of a data scientist.
Data scientists already combine advanced math, statistics, machine learning, programming, and domain knowledge to analyze data, create models, and provide results in a helpful format. Creating and managing data pipelines and the infrastructure to deploy and run these models shouldn't be part of their job description. That's for the data engineers and MLOps engineers.
Data engineers are the ones who create and manage data pipelines. And while data engineers are not often mistaken for data scientists, data scientists are often mistaken for data engineers. It is not common to expect a data engineer to create a model, but data scientists are often expected to build production-grade data pipelines. As a leader, separate the roles of a data engineer from a data scientist and delegate responsibilities accordingly.
Another set of people required to keep data scientists from burnout are the MLOps engineers. They are the people who guarantee customer access to the model. If the model is unstable or constantly failing, you will lose users. The operations engineers won't let that happen. They usually have extensive experience in operating highly reliable large-scale production systems.
Some organizations think they do not need MLOps engineers "because everything is running smoothly." However, MLOps is tasked with ensuring everything continues to run smoothly. So, invest in MLOps early to proactively create a scalable infrastructure before you’re plagued with outages. MLOps teams monitor production, scale infrastructure, and ensure high availability and quick recovery from potential disasters.
And finally, you'll also need data annotators/labelers. It's pretty incredible that data scientists build machine learning algorithms that diagnose illnesses, recommend our favorite songs, and even drive cars for us. But none of this would be possible if no one "told" these AI-powered platforms what they are seeing and how to interpret it.
Data annotators add metadata tags to mark up aspects of text, images, audio, and video clips. In order for artificial intelligence to accurately perceive and interpret the data, the labels must be accurate. A solution using artificial intelligence cannot recognize the patterns in unstructured data sets unless the data is properly annotated. Because this data annotation requires considerable human effort and expertise, companies may hesitate to commit to it in full, but it is a crucial step in the success of any machine learning project.
2. Provide a work environment that is pro-career growth
Considering the high demand for data scientists and the number of job openings available, a work environment that proactively supports their growth can help retain data scientists. Data science is a newer field without strong norms around career progression. By being proactive about the team's career growth, you will not put your data scientists in a position where they have to play internal politics or leave the organization to advance professionally.
Have a development plan in place to make sure team members' learning plans align with the expertise required in the team. Creating a development plan involves three steps. First, get an honest assessment of the team members' current technical and non-technical capabilities via feedback from their peers.
Next, define the additional technical expertise and soft skills they require to have the most impact. And lastly, build a plan for closing the skill gap. As a data science leader, you need to guide professional development to ensure that the team is not investing time and energy into non-productive processes.
To manage a data science team, it is also important to provide access to resources such as courses, books, webinars, conferences, etc., to encourage continuous learning. However, learning opportunities are useless without planned time to take advantage of them. Encourage and even schedule dedicated time for the team to invest in themselves through continuous learning opportunities. Also, help employees keep up-to-date on the most important techniques and technologies related to data science.
Another way to provide a growth-inclined work environment is to encourage internal learning. For example, organizing monthly "Lunch and Learn" or skill-sharing workshops where senior data scientists on the team can guide junior data scientists. It is also an excellent opportunity to improve communication within the team.
3. Establish "value" as a performance metric
Data science is a field that requires creativity and cannot be effectively measured only by statistics such as model performance. It is essential to assess data science teams based on the value they create internally within the organization or on the benefit they provide externally to the customer. This assessment metric will steer executives and stakeholders away from setting unrealistic metrics like "number of projects completed in a year."
Value is often hard to measure, so, as a data science leader, you have to take deliberate actions to capture, measure, and communicate the value the team is delivering. To track value as a performance metric, you can start by checking how each model contributes to business efficiency, revenue, and cost reduction.
Establishing value as a performance metric will help data scientists take a deeper look at a project's scope and approach it with best practices to ensure the end result positively impacts the business.
A data science leader can also measure value based on the work quality of the team members. Data scientists, engineers, architects, annotators, everyone really, will create processes, labels, and models that contain errors, irrespective of their experience. But, a constant repetition of the same error will reflect poorly on the operationalized model and the team. Evaluating team members based on the value of their contributions will encourage a more conscious effort to deliver high-quality models.
4. Structure the team according to company size
As a data science team leader, the onus is on you to structure the team. Should all your data scientists sit in one core department? Or should they be distributed and closer to their functional teams? Or somewhere in between? It depends on your company's size.
When a team is just starting, it's best to have them together in a centralized team structure in one core department. It ensures the company uses its data science resources efficiently and encourages growth, but this structure can also place the data scientists far from the problems they need to solve.
For a much larger enterprise team, like the data science team at Google or Amazon, it is practical to have a decentralized team structure where you distribute the data scientists to the functional teams they support. Projects can run more efficiently, but the distributed data scientists might feel isolated from their peers and potentially miss out on learning opportunities and mentoring.
For established teams, a hybrid structure is often the most effective if the team isn’t too large. In this case, there is a central data science team, but data scientists can be distributed temporarily to functional teams when necessary.
Data science is a "people" sport.
Managing data science teams requires a combination of a sound strategy, well-designed processes, and passionate, productive people. While each of these factors influence the value delivered by data science projects, having the right people and team structure in place will lead to better strategies and more effective processes.
Photo Credit: one-man band 4 by randychiu via flickr; modified to black and white.