Machine Learning Deployment – Part II: Production Considerations

This post continues a previous series.

The quality necessary for model deployment depends on a number of factors including the impact of the model, resource availability, and functional and non-functional requirements of the deployment.  As seen in the first post of this series, deployments can and often should be treated like regular software.  This means implementing traditional software engineering practices including but not limited to: maintainability, security, logging, testing, source control, and proper error handling. In addition, specific items around model building and live monitoring are essential.  

The Chief Data Scientist at Beeswax, Sergei Izrailev, published a great deck on several types of deployments, “Design Patterns for Machine Learning in Production”.  It’s important to educate your business owners on the difference between a prototype and production environment and why it’s worth investing in.

Questions to Ask

There are functional and non-functional requirements to model deployments, depending on the impact/use of the model.  These are some of those questions:

  • Are the predictions done in batch on a scheduled basis? Or are they needed in real-time?

  • What are the uptime requirements?

  • Are there regulatory requirements?

  • What are the security requirements?

  • Is the training data in a different format than the production data?

  • How should monitoring be handled at both an application level and model level? It’s important to be aware of things like feature drift and outlier detection.

  • What does re-training look like and who is responsible for re-training? How do you handle QA and automated testing?

There are many different approaches you can take to deploy a model depending on the language it was built in (Python, R, Java, C++) and if you serialize the model, pre-cache the predictions, or use PMML.  Do you have the dedicated resources available to develop an end-to-end in-house tool for training and deployments?  It could make more sense for your company to work with one of the numerous start-ups in this space trying to automate the model deployment process.  Various cloud vendors provide off-the-shelf model deployment tools you could also investigate as you begin this journey.  There is not one right method, and the correct solution will be unique to your business.   

Deploying Models in Practice

It is a known data science anti-pattern to use one programming language for developing the model, then turning around and re-training it in a language native to the consuming application for the actual deployment.  This process would obviously be time consuming, prone to errors, and likely lead to sacrificing accuracy. 

Consider the following situation:  A data scientist has worked for the last few months to train an acceptable model in Python, and it’s ready to go to production.  After the initial deployment the model is providing tremendous value to the business.  But let’s jump ahead six months and the model has now decayed. 

We need to be able to get to the root problem that has caused the model to decay, fix it, and deploy the updated model as quickly as possible.  Being able to deploy the Python model directly cuts out the time consuming and error prone process of then having to re-train the updated model in a different language.  This is why exposing the models as REST endpoints has become popular. 

Training models is similar but different to traditional software development.  Instead of coding a lot of business logic to make specific decisions, the logic is learned from data through training.  Training involves working heavily with subject matter experts from the business, manipulating the data to derive predictive features, prepping the data in a suitable format for the model, tweaking hyperparameters, and validating the results through various statistical metrics and against holdout data. 

When a model has decayed, if a quick re-training of the model with more recent data doesn’t produce an acceptably accurate enough model, the steps defined in the last statement will have to be re-visited.  This entire process is much more manageable if the same programming language used for training is used for production.  This will allow the data scientists to pull the model down locally and start investigating immediately.  Ideally, we would have already detected the potential for model decay by having model monitoring in place.

End-to-end Platform

As we mentioned previously, we do not have all the answers and are constantly learning and improving our process.  Few companies have made the leap to being truly model-driven as opposed to data-driven, and we hope to be at the forefront of this paradigm shift.  In our quest to achieve this, we are constantly on the lookout for examples from companies that have successfully made this jump and try to learn from them to help guide us.  One of the leaders in this space is Uber – they have created an in-house machine learning platform that handles their model deployment, named Michelangelo.  The success of the platform is how it allows data scientists on the team to quickly update and push out models.  Speed to market and the ability to update models in real-time is fundamental to transforming into a model-driven company.

Part III of this series has been posted.

Mike Kehayes