The Python data ecosystem provides amazing tools to quickly get up and running with machine learning models, but the path to stably serving them in production is not so clear. We'll discuss details of wrapping a minimal REST API around scikit-learn, training and persisting models in batch, and logging decisions, then compare to some other common approaches to productionizing models.
We'll be discussing Simple's implementation of a Python microservice for classifying incoming chat messages by subject category, enabling our customer relations agents to develop specializations and onboard more quickly.
We'll walk through a bit of the code for our model and what the interface looks like in scikit-learn for training a model, persisting it to disk, and requesting a prediction.
Once we understand the shape of interacting with scikit-learn, we'll take a look at wrapping it in a Flask app and the concerns about understanding how that application is behaving in production. This includes performance metrics, logging results to a database, and degrading gracefully when things go wrong.
We'll then switch gears to talk about all the work that needs to happen outside of the application itself. We use a separate framework to execute scheduled jobs, periodically retraining the model on new records in our data warehouse or testing out a new iteration of the model code. We evaluate the performance of the models based on historical data, and then update the model running in production when we find a better-performing configuration.
Finally, we'll discuss how other companies approach the problem of serving predictive models in production. Varying concerns around performance needs, security constraints, and technical expertise can vastly change the shape of the solution.