Deploying machine learning models: Why is it so hard?

In this article, one of BasisAI’s resident data scientists reflects on the challenges she has faced as a data scientist attempting to deploy ML solutions. She also outlines the various data science workflows that are employed by data science teams today, the main challenges associated with each, and how to navigate the complex landscape in order to get more robust models into production faster.

Data Science workflows vary little up to deployment

The deployment process for machine learning microservices varies significantly from company to company, and even across teams within the same organisation. As a data scientist, this can create many problems when trying to ensure your models’ impact on production systems.

Companies are increasingly leveraging ML to solve a diverse range of business problems. There is no single established data science workflow, however most data scientists at large technology companies have a workflow that is some variation of this:

CRISP-DM

CRISP-DM Process Flow Diagram2

Typically, a data scientist’s workflow is iterative and is never a one-way path from data cleaning to feature generation, to building a model they are satisfied with. However once there is a working model ready to be deployed in production, this is where approaches start to diverge. A survey by Rexer Analytics1 showed that deployment is consistently ranked among the top challenges for data scientists and only 40% of data scientists get their models deployed in days. Here we explore the challenges I have faced as a data scientist in deploying ML solutions.

Current state of model deployment

Why do we need to deploy our models?

Deploying models to production means integrating the model with existing services in the production environment by creating endpoints which other services can call. This is crucial if businesses want to leverage ML at scale. 

How are data scientists currently deploying their models?

 

1. Pass the models to DevOps/engineers to implement in production code

For some data scientists (especially those in large organisations with significant engineering and DevOps resources), deployment is often viewed as a black box. In one of my previous companies, a large technology company with more than a thousand engineers, bringing a model to production involves throwing the model over the wall to the engineers and waiting for the engineers to work their magic. The engineer would attempt to understand the model based on the documentation and code the data scientist provided, and they would rewrite the code in the language of the production system. The main problems with this approach are:

  • Risk of translation error. Even with documentation and transfer of code, there would usually still be back-and-forths between the engineer and the data scientist to clarify the logic in the code. The code is prone to the introduction of bugs when the engineer rewrites the code.
  • Long process to model deployment. Data scientists do not have control over the deployment timeline. Models can take months to get to production (if they even get deployed).
  • Models not getting deployed. Due to the complexities associated with model deployment and ongoing maintenance, models end up sitting in repos and never getting to see the light of day. This is unfortunate as models will be most useful only after they get deployed.
  • Changes to model logic. As production languages may not support some of the newer ML and deep learning frameworks, models may have to be simplified in order to be implemented in the production language, potentially leading to worse performance.
2. Hand deploy models with DIY solutions

Data scientists working in teams with limited access to engineering resources face another problem. They often have to hand deploy their models using some DIY solution. To illustrate this, I challenged myself to manually deploy a model. I don't have software engineering experience and didn't know where to begin. I started by trying various frameworks and reading potentially relevant documentation. This took numerous attempts and these are the steps I eventually took to deploy my model.

  1. Write Flask app to serve ML model
  2. Dockerize app by defining a Dockerfile that installs various dependencies, including Flask and Gunicorn (Gunicorn is a WSGI application server that passes requests from the web server to the Flask app).
  3. Provision Kubernetes cluster on GCP (Configure instance type and nodes)
  4. Create Kubernetes manifest files 
    1. Deployment (Specify image and number of replicas for scaling)
    2. Service (Expose service externally using load balancer)
    3. Horizontal pod autoscaler (Scale number of pods based on CPU utilization)
    4. Config map (Decouple configuration from Docker image)
    5. Persistent volume and persistent volume claim (Store model artefact)
  5. Deploy docker image on Kubernetes (using kubectl) and test endpoint

I managed to get a successful deployment after following several tutorials and would be able to follow this playbook should I need to hand deploy models in the future. However, model deployment is not the endgame. There are various issues I have not considered:

  • Lack of knowledge configuring deployment settings. Admittedly, I do not have a good grasp of the intricacies of the system and would have trouble debugging should problems arise. In addition, I used a toy example which is not representative of typical production workloads. In a real world setting, scaling, latency, and costs would be important considerations and data scientists may not have the expertise to set the appropriate configurations.
  • Difficulty monitoring endpoints. Data scientists would have to configure Datadog, Prometheus, or similar to monitor their endpoints. 
  • Difficult to log. This includes the logging of requests, responses, and errors. Some options include ELK or Fluentd.
  • Lack of uniform framework. This is just one of the approaches data scientists can utilize to deploy their models, however there are other approaches. As there is no uniform framework, data scientists may find it difficult to hand off their deployed models to their peers.

 

Additional challenges

Whether you pass your models to DevOps for deployment or hand deploy them, you would be likely to encounter the following challenges:

  • Difficult to maintain. As data scientists continue to iterate on the models, they may find it challenging to keep track of the model version and parameters they have used for training. As a result, it may also be difficult to reproduce model predictions. In addition, there is no easy way of tracking which model versions have been deployed to the endpoints.
  • Difficult to monitor model performance and detect model rot. As most of the current deployment frameworks were not created specifically with ML workflows in mind, they do not provide easy monitoring of model performance over time. For instance, the integrity of ML workloads depends on the stability of input data. Models may degrade if input data changes over time, however it is difficult to track changes in the distribution of input data with current frameworks.
  • ML systems are complex. We often see roles advertising for full stack data scientists, however most data scientists do not have experience managing the entire lifecycle of the model. The model is just a small part of the entire infrastructure, as explained well in Google’s paper on hidden technical debt. To manage the lifecycle of the model, the data scientist would have to first have a full understanding of the infrastructure. 

ML System InfrastructureSource: Hidden Technical Debt in Machine Learning Systems

  • Companies have recognised that the deployment of ML workflows is a significant issue so much so that many large tech companies have dedicated resources to build their own platforms for deployment. For example, Uber has Michelangelo and Google has TFX. However, it does not make sense for most companies to build their own platforms internally.

 

Future state of model deployment

Data scientists should have to worry less about deployment and instead spend their time doing what they do best - understanding the data and context deeply, and building the best models to extract greatest predictive power. We have recently launched Bedrock, a machine learning platform which takes away deployment complexities, removing frustration from data teams and allowing them to easily manage all aspects of the model lifecycle.

The future of model deployment doesn’t have to be complex, convoluted and challenging. It can be seamless, secure, rapid, robust...

...and done in 1 click.

Deploy AI test

 

Get more information on Bedrock or speak to an ML deployment expert via contact@basis-ai.com


References