Demystifying MLOps: part 1

Solutions & Products
- Solutions & Products
- Cloud Services
  Cloud Services
  
  World-class data management and storage solutions in the biggest public clouds.
  Visit Cloud Services
  
  Solutions
  
  Microsoft Azure
  
  Google Cloud
  
  AWS
  
  IBM Cloud
  
  Products
  
  Azure NetApp Files
  
  Amazon FSx for NetApp ONTAP
  
  Cloud Volumes Service for Google Cloud
  
  Cloud Volumes ONTAP
  
  Compute Optimization
  
  Cloud Sync
  
  Cloud Data Sense
  
  Cloud Tiering
  
  Cloud Backup Service
  
  Cloud Volumes Edge Cache
  
  Global File Cache
  
  Cloud Manager
  
  Astra
  
  Cloud Insights
  
  File Services / File Sharing
  
  MySQL
  
  PostgreSQL
  
  Kubernetes
  
  Quick Links
  
  Cloud Central
  
  Data Fabric
  
  Why NetApp for Cloud Services
  
  Spot by NetApp
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Hybrid Cloud
  Hybrid Cloud
  
  Build your business on the best of cloud and on premises together with Hybrid Cloud Infrastructure solutions.
  Visit Hybrid Cloud
  
  Solutions
  
  Virtualization
  
  Service Provider Infrastructure
  
  IT Automation
  
  Private Clouds
  
  VMware
  
  Red Hat
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Hybrid Cloud
  
  What is Hybrid Cloud
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Data Storage
  Data Storage
  
  NetApp is the proven leader when it comes to modernizing and simplifying your storage environment.
  Visit Data Storage
  
  Solutions
  
  SAN
  
  Scale-Out NAS
  
  Unstructured Data Solutions
  
  Products
  
  AFF A-Series
  
  AFF C190
  
  E-Series
  
  EF-Series
  
  FAS
  
  FlexPod
  
  SolidFire
  
  StorageGRID
  
  Disk Shelves & Storage Media
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Data Storage
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Cyber Resilience
  Cyber Resilience
  
  Our industry-leading solutions are built so you can protect and secure your sensitive company data.
  Visit Cyber Resilience
  
  Solutions
  
  Data Protection
  
  Ransomware Protection
  
  Business Continuity / Disaster Recovery
  
  Data Backup and Recovery
  
  Data Compliance
  
  ONTAP Data Security
  
  Products
  
  SnapCenter
  
  Cloud Backup
  
  Quick Links
  
  Data Fabric
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Data Management
  Data Management
  
  Get complete control over your data with simplicity, efficiency, and flexibility.
  Visit Data Management
  
  Solutions
  
  Simplicity365
  
  Products
  
  Active IQ
  
  Element Software
  
  OnCommand Insight
  
  ONTAP Data Management
  
  SANtricity Software
  
  Virtual Infrastructure Management
  
  Quick Links
  
  Data Fabric
  
  Data Management Specialists
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Enterprise Applications
  Enterprise Applications
  
  Speed application development, improve software quality, reduce business risk, and shrink costs.
  Visit Enterprise Applications
  
  Solutions
  
  SAP
  
  Oracle
  
  MS SQL
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for Enterprise Applications
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- DevOps
  Devops
  
  Our solutions remove friction to help maximize developer productivity, reduce time to market, and improve customer satisfaction.
  Visit Devops
  
  Solutions
  
  Configuration Management
  
  Containers
  
  Google Clouds Anthos
  
  Continuous Integration Continuous Delivery
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for DevOps
  
  What is DevOps
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- AI
  AI
  
  NetApp AI solutions remove bottlenecks at the edge, core, and the cloud to enable more efficient data collection.
  Visit AI
  
  Solutions
  
  Big Data Analytics
  
  High Performance Computing
  
  Products
  
  ONTAP AI
  
  Quick Links
  
  Data Fabric
  
  Why NetApp for AI
  
  What is AI
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- VDI
  VDI
  
  Provide a powerful, consistent end-user computer (EUC) experience—regardless of team size, location, complexity.
  Visit VDI
  
  Products
  
  Spot PC
  
  Virtual Desktop Service
  
  Quick Links
  
  Data Fabric
  
  What is VDI
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
- Services
  Services
  
  We have a service for your every need, plus the ones you’re about to discover.
  Visit Services
  
  Services
  
  Professional Services
  
  Support Services
  
  Quick Links
  
  Data Fabric
  
  Customer Stories
  
  Test Drive
  
  Free Trials
  
  How to Buy
Support & Training
How to Buy
Community

ML in production is challenging but not impossible

ML has advanced a lot, and we are privileged to have most of the resources that we need at our disposal. We have access to compute resources (on premises and in the cloud), to the necessary quantity and quality of datasets, and to state-of-the-art ML research. ML systems are also being streamlined. Data engineers together with data scientists are transforming and preparing data that’s consumed by ML models for training. And models ultimately go to production for model serving, where they’re monitored and retrained if necessary.

In the real world, only a small segment of an ML system is composed of the ML model code. The rest of the process consists of data collection, data consolidation, system configurations, model and data verification, debugging and testing, resource management and infrastructure serving, feature and metadata management, and monitoring.

An example illustrating ML model challenges

Let’s say that you have a team of data scientists and data engineers who are working on dynamic pricing for airline flight bookings. The business objective is to allocate ticket price based on travel dates; seat availability; and, to increase sales, a relative competitor pricing model.

You and your team work mostly independently in your own work environments, like Jupyter Notebooks, and use the dataset that’s available for training and validating the model. Maybe team members share notebooks with each other by email or they use some code versioning (GitHub, Bitbucket, etc.). They also have regular catch-up meetings to make sure that everyone is in sync and that the project is progressing as expected.

You’re all using allocated compute and storage resources (AI infrastructure) for training by executing the cells in your notebooks. After some time, your trained model is producing good enough results on your holdout test dataset, and you believe that it will work in the production environment and will predict better pricing for airline tickets. You also have data analysis and visualization reports in your notebooks that back the results and validate your model’s performance.

Finally, it’s time to deploy your best trained model and integrate it into the existing airline ticket–booking system. But there are a few unanswered questions that you need to take care of, including concerns such as:

How do you use the code cells of Jupyter Notebooks for a production flight-booking system and preserve the data transformation that took place during training?
In the production system, how are you going to continuously monitor model performance? And how are you going to compensate for deviation in the predictions that might occur due to changes in data distribution over time, which might result in model drift?
How is your team going to reproduce the experiments and fine-tune trained models for better performance, considering the data used at that point in time?
How can you effectively scale the model for retraining on a larger dataset?

If you don’t consider and mitigate these challenges, you can have disastrous consequences near the end of your project. You might have to rebuild everything from scratch, or the project might fail to reach the production stage.

Why you need a new approach

Successful and mature AI processes require automation of these phases or to smoothly carry out training of new models with new data or with new implementations. Automation helps abstract away the complexity and lets you focus on the actual problem at hand. Wait a second, isn’t that very similar to what DevOps practices are known for, and can’t we use a similar concept for ML? That’s right, I’m referring to machine learning operations, or MLOps for short.

But can we really use DevOps methodologies for ML and simply call it MLOps? It does seem to be an obvious option, because an ML system is a software system (Software 2.0) at its core. But it’s a different beast altogether, and it demands a new mindset for handling AI development and workflow management. The core difference between ML (Software 2.0) and a traditional software stack (Software 1.0) is that ML is not just code and configurations. Data is also an integral component of the ML lifecycle and defines the behavior of a trained model.

What is MLOps?

MLOps is a methodology and a practice for a collaborative approach, and it combines data engineering, ML, and DevOps. It aims to operationalize the process of training and tracking models at scale, deployment and maintenance of models in production, and the entire data pipeline that encompasses the ML system. MLOps also ensures the model performance and measures it against business objectives, and it enables continuous delivery of business value. The following figure shows some of the benefits of MLOps.

NetApp improves patient care

When your organization uses good MLOps practices, you can ultimately produce better results while being cost-effective. You can set up a platform and architecture in place to make the whole process as easy as pushing code to code versioning. The rest (packaging, preprocessing, training, ML versioning, model deployment, autoscaling, etc.) is taken care of for your team.

To learn more about MLOps and how Netapp^® AI makes it easier, check out our featured video.

Share this page

Muneer Ahmad Dedmari