While businesses have learned to leverage the powers of machine learning, there is still some confusion about proper ML team structure and organization. Nowadays, it’s not uncommon for companies to seek help from the outside and turn to machine learning consulting exactly for this matter. In this article, we will discuss the distribution of roles in ML teams and outline important tips for assembling them for both medium-sized and large organizations.
In short, data scientists build mathematical models that are tasked with interpreting data and predicting certain outcomes.
Besides their vast technical expertise, expert data scientists are also critical thinkers. There are countless examples of when perfectly-built models produce biased results. Data scientists often need to rely on common sense and always be on the same page in regard to overall business vision and objectives.
When we take the data scientist’s job out of the scope of building an ML team, data scientists build ML pipelines that help businesses make data-driven decisions. In this case, again, on top of a solid technical foundation, business stakeholders cite the ability to communicate complex model outcomes as one of the most critical skills of data scientists.
Sometimes companies separate the roles of data analyst and data scientist. In this case, data scientists build models, while data analysts detect trends and make conclusions based on the model outcomes. Deciding on what approach is best should come down to examining the skills of a particular person. If a data scientist you hired is an exceptional model developer who doesn’t have enough skills or experience in analyzing results, hiring a data analyst might be the right choice.
Data is bread and butter for any ML project. Data engineers are responsible for building an infrastructure that collects, stores, transforms, and distributes data. They manage how exactly the data is gathered, processed, and stored in a data repository.
Data engineers spend a considerable amount of time cleaning data, which is rarely engaging but is still an extremely important process. In order for data scientists to be the most effective and focus on what they do best (building models), data engineers need to ensure that the data is consistent, cleaned, accessible, and ready to be used.
Sometimes in large ML projects, data engineers can be too far apart from data scientists. This happens because, in theory, data engineers may not have a clue about ML yet still deliver. Previously, when data engineers were mostly taking care of data for various business intelligence applications, the lack of communication with data scientists didn’t affect productivity most of the time. However, given that in the ML realm data scientists’ methods are continuously changing and evolving, communication becomes critically important. Data engineers often need to modify data sets based on the data scientists’ input.
Data engineers make data ready for data scientists to build a mathematical model. The ML engineers close the production cycle by deploying the model. In other words, they ensure that the model built by data scientists can be easily operated in the environment of choice. While data scientists sometimes handle ML engineers’ responsibilities, their skillset is often incomplete. For example, a model can be trained or deployed in the cloud, which would require knowledge of Kubernetes, a tool that data scientists are rarely familiar with.
ML engineers are also usually responsible for MLOps, which means that they manage training, versioning, and maintaining of the models. On top of that, ML engineers monitor the model performance and usually report to data engineers about the needed changes to the underlying data.
Essentially, an ML engineer is a software engineer with solid expertise in data science. With ML engineers taking care of the ML model going into production, data scientists can focus on refining the model.
Tips for building ML teams
Now as we’ve figured out the most essential roles in the ML team, let’s talk about the best team structures and frameworks for both medium-sized and large organizations.
Given that ML talent is scarce and expensive, medium-sized businesses will most likely need to prioritize some ML roles over others. The most effective way to go about it is to hire one ML expert with a proven track record of successful projects and surround him or her with data engineers. The key here is to establish a team dynamic where the data engineers are willing to learn from the ML expert. In most cases, data engineers will be able to upskill and get relevant ML experience. This way, slowly but surely, such a software development team with an ML expert at its core will transform into a full-fledged ML team.
Other than that, it’s paramount to hire more ML engineers than data scientists. More often than not, an ML team tries hard to build the best model that might fall short of reaching the business goal. Such a misalignment can be avoided by speeding up the production cycle, effectively revealing problems in the model architecture.
The single most detrimental mistake large companies make is building their own ML tech stack. Understandably, ML practicians get overly excited about developing their own neural network architectures and diving deep into research. However, such initiatives rarely bring any value. Most of the time, ambitious data scientists and ML engineers simply develop already existing architectures while Facebook’s or Google’s teams use open-source models and frameworks. In the majority of cases, unless it’s an extremely unconventional application, the range of available tools is more than enough to develop functional ML applications.