Machine Learning (ML) has business potential in almost every industry, and more companies are experimenting with its value. Experiments are necessary but rarely bring lasting change and value to a company. Only when the ML application is successfully put into production and an operational setup is designed around it can insights and automation develop day by day.
The transition from experimenting with ML to launching business-critical ML applications is often difficult and can create uncertainty in any IT organization. ML solutions are complex, with more moving parts compared to a standard software application. This is because data is constantly changing, where the model is trained on yesterday’s data but needs to be applied to production data—tomorrow’s data. In the quest to optimize ML models, their characteristics, called hyperparameters, will also change over time.
From a top-down perspective, an ML project can be divided into four phases: business clarification, development, implementation, and operation. Many companies want to "release" the developers, who are the company's data scientists, from the initial ML model once it goes into production so that the developers can focus on what they do best, and the model is instead managed by a centralized unit—such as an operations or support team within the IT or BI department. This can be a good idea, but it is not straightforward, and there is a risk that models in production will deliver poor performance and incorrect or unintended results, at the cost of business value and reputation.
In recent years, the concept of ML Ops has emerged, framing a set of best practices for managing the ML model lifecycle. Here, principles from traditional software development are adapted to ML, contributing to the standardization and streamlining of processes as much as possible. In this blog post, we will bring ML Ops down to earth and point out five elements that we believe improve the chances of successfully launching ML applications and reaping scalability opportunities where they exist.
Here are five tips for getting machine learning into production:
1. Maintain operations
An ML application often produces outputs like sales forecasts, predictions of which machines should be maintained to avoid breakdowns, or how to adjust the gates at a treatment plant to handle the pressure of an upcoming rainstorm. The most critical aspect of an ML system is that it provides valid information to the business and that the solution is up and running.
To ensure the solution works, efforts must be made on update strategies, rollbacks, and health checks from the application. If you have a business-critical ML solution, you cannot underestimate the importance of including tests that help ensure an update doesn’t inadvertently break something that previously worked well.
It is therefore important to adopt the end-user’s perspective and design a range of tests that truly check the functionality and results that the end-user values most. For example, the integration with the database should be tested, the model’s predictive ability on a known data set, or the UI functionality if it is important to the user.
2. Continuous validity checks enhance credibility
Data is the foundation of every ML model, and as previously described, data is always in motion—whether it’s text, images, audio, or tabular data. The fundamental characteristics that define new data can suddenly differ from what the model was trained on. For example, COVID-19 has disrupted many forecast models in production, which struggle to automatically handle such a severe data shock. COVID-19 is a clear example that we can all relate to, but many times the danger lies in the unseen. An ML model often becomes less accurate over time as the underlying characteristics of the data change.
A solution to this, often within reach, is to retrain the model to ensure it adapts to new data. Whether it is advantageous to automatically retrain a model often depends on the domain, but many times a manual analytical review of the model’s performance and characteristics is needed to ensure that the retrained model remains valid. For this process, it is good to work with a golden data set that the business targets, validates, and continuously maintains. This provides a common language for performance and thus a clearer understanding of whether the retraining or model update was beneficial.
Similarly, it is advantageous to continuously take a self-critical look at the ML model’s predictions. Here, you can work with control groups or A/B tests to highlight what actually happens when the business does not act on the model’s results. This reveals self-fulfilling prophecies, where, for example, customers change their behavior because the business contacts them as a result of the ML model’s predictions.
3. Find scaling potential
The development of ML models is largely tailored to the individual application, and the development phase itself therefore scales poorly across the board. For example, an HR department’s employee turnover model could only be used sparingly for inventory forecasts in the sales department. This obviously doesn’t apply to areas that can be more considered standards, where the development of, say, a power forecast model for a wind farm can largely be adapted to new wind farms. Fortunately, there are other areas in the value chain where scalability can really be achieved.
Deployment
Often, deployment pipelines for an ML application can be reused directly in the next one, as many of the technical requirements often recur. For example, they must be able to interact with a SQL server as well as a range of cloud services, install certain Python/R packages, and be part of a Docker context to ensure stable execution over time. One of the newer additions is the ability to configure multi-stage pipelines, which define the phases of continuous integration and continuous deployment based on source code. Similarly, the configuration of the infrastructure on which the solution runs can be specified from code, called infrastructure-as-code, and together they create the perfect framework for scaling, as deployment pipelines are based on source code that can be reused.
Approach
Similarly, standardization can be leveraged within the business. An ML application is often seen in the context of other data applications, where, for example, the nightly ML prediction job should run right after the data in the data warehouse has been updated. Therefore, the triggering and scheduling of the applications are handled centrally, and thus the pipeline that handles the ML application can be reused across applications. Processes for setting up and monitoring application logging, setting up alerts, and creating support tickets can also be reused.
4. Who does what?
Two ways to organize data science:
Overall, there are two models for organizing data science profiles. The central and functional organizational model, where machine learning and data science profiles are centrally placed in a shared service function. The advantage of this model is that there isn’t much downtime, as the critical mass is larger and knowledge sharing has optimal conditions. The disadvantage is that data science is not directly tied to a critical business area, making it harder for the business to understand how the team can be utilized. Ideas often come from the data science team and may not necessarily align with the business’s roadmaps.
The decentralized organizational model, where expertise is tied to the business’s responsibilities and products, is used by many large companies where data insights are central to their operations. Here, data science profiles sit alongside data engineering teams, often under the same product managers, and are thus directly linked to a business-critical function. The downside is that knowledge sharing and sparring are more difficult, as the data science team is often smaller in this model.
Of course, a mix of the two models can be used, with a greater focus on project-based work, which can introduce overhead when it comes to knowledge sharing and communication.
The right choice naturally depends on the size of the company and its assessment of the potential of data science and machine learning. The right choice depends on whether the company has committed to a data strategy where data science is an integral part of the decision-making process or if data science is more seen as a provider of statistics and ad hoc analyses.
Role distribution for implementation:
The responsibility for developing the ML model is clear. However, the role distribution for deploying and operating ML applications is often under discussion, and there is no single responsibility model that works everywhere. When responsibility for the model is transferred from the developer to the next person in the chain depends on the maturity level of ML in the company, as well as the experiences of deployment frameworks and DevOps methods in the development team. Until the ML model is tested in a test environment, it’s not really known where the battle will be fought. This argues for developers being an important part of the deployment process, even because it is difficult for others to understand the errors that occur along the way. For example, is it due to the source code, integration with external services, errors in the Docker image, or is the hardware the bottleneck? Newer cloud tools make it easier to increase reusability and thus bring developers closer to the process.
Role distribution for operations
In large organizations, there is often a central unit responsible for the operation of ML and other data applications. Given the complexity, it may seem like an overwhelming task to keep something running in production that is constantly changing and that you did not develop yourself. To address this, it is good to formulate a set of concrete requirements and standards that developers must follow before the model is handed over to production. The solution should be set up with logging, alerts, documentation of error types, and a support plan so that it is clear who does what in different scenarios. It is also beneficial to introduce an Acceptance Gate, where the operations team evaluates whether the model’s support tools meet the requirements before the operation of the ML model is handed over.
5. Standards, standards, standards
Just as standards help the operations team in the handover of the ML model, you can work with familiarity and structure in the deployment phase.
Want to know more? Reach to us via the form below, and we will get back to You!
Contact us