MLOps Cheat Sheet

Updated 2026-04-20

Next Topic: Model Evaluation Cheat Sheet

🧠Study flashcards on this topic142 cards · spaced repetition→

MLOps (Machine Learning Operations) is a systematic discipline that extends DevOps principles to machine learning systems, enabling teams to build, deploy, and maintain production-grade AI models at scale. It bridges experimental data science and reliable production systems through automation, continuous integration, and observability. In 2026, MLOps encompasses three distinct sub-domains—traditional MLOps for classical models, LLMOps for large language models, and the emerging AgentOps for autonomous AI agents—while the EU AI Act's high-risk AI provisions (effective August 2026) and the launch of MLflow 3.0 with native GenAI and agent tracing have reshaped governance and lifecycle management across the industry.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 174 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core MLOps PrinciplesTable 2: Model Training & ExperimentationTable 3: Data Management & VersioningTable 4: Model Deployment StrategiesTable 5: Model Serving PlatformsTable 6: Monitoring & ObservabilityTable 7: Pipeline Orchestration ToolsTable 8: Containerization & PackagingTable 9: Model Governance & ComplianceTable 10: Testing & ValidationTable 11: CI/CD for MLTable 12: Explainability & InterpretabilityTable 13: Advanced MLOps PatternsTable 14: LLMOps (Large Language Model Operations)Table 15: AgentOpsTable 16: Security in MLOpsTable 17: Cost OptimizationTable 18: Tools Ecosystem

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core MLOps Principles

These are the foundational practices every MLOps team builds on—the same version control, CI/CD, and observability ideas borrowed from DevOps, plus the ML-specific additions of continuous training, experiment tracking, feature stores, and model lineage. Master these eleven and you have the vocabulary for everything that follows.

Principle	Example	Description
Version Control	`dvc add data/train.csv` `git commit -m "v1.2 dataset"`	Track datasets, models, and code together using Git + DVC to ensure reproducibility and enable rollback to any previous state.
Continuous Integration (CI)	`pytest tests/` `flake8 src/`	Automatically run tests and linting on every code commit to catch bugs early and maintain code quality standards.
Continuous Deployment (CD)	`mlflow models serve -m models:/prod/1` `kubectl apply -f deployment.yaml`	Automate model deployment to production with zero-downtime updates and instant rollback capability.
Continuous Training (CT)	`airflow trigger_dag retrain_model` `if drift > 0.1: retrain()`	Automatically retrain models when performance degrades or new data arrives, keeping predictions accurate over time.
Experiment Tracking	`mlflow.log_param("lr", 0.01)` `mlflow.log_metric("accuracy", 0.95)`	Record hyperparameters, metrics, and artifacts for every training run to compare experiments and reproduce best results.
Model Registry	`mlflow.register_model("runs:/abc/model", "churn_predictor")`	Centralized repository storing versioned models with metadata, lineage, and stage transitions (staging → production).

Table 1: Core MLOps Principles

Principle	Example	Description
Version Control	`dvc add data/train.csv` `git commit -m "v1.2 dataset"`	Track datasets, models, and code together using Git + DVC to ensure reproducibility and enable rollback to any previous state.
Continuous Integration (CI)	`pytest tests/` `flake8 src/`	Automatically run tests and linting on every code commit to catch bugs early and maintain code quality standards.
Continuous Deployment (CD)	`mlflow models serve -m models:/prod/1` `kubectl apply -f deployment.yaml`	Automate model deployment to production with zero-downtime updates and instant rollback capability.
Continuous Training (CT)	`airflow trigger_dag retrain_model` `if drift > 0.1: retrain()`	Automatically retrain models when performance degrades or new data arrives, keeping predictions accurate over time.
Experiment Tracking	`mlflow.log_param("lr", 0.01)` `mlflow.log_metric("accuracy", 0.95)`	Record hyperparameters, metrics, and artifacts for every training run to compare experiments and reproduce best results.
Model Registry	`mlflow.register_model("runs:/abc/model", "churn_predictor")`	Centralized repository storing versioned models with metadata, lineage, and stage transitions (staging → production).