AI in Production refers to the operational deployment, scaling, and management of machine learning models beyond experimental environments. Unlike traditional software, production ML systems face unique challenges including model drift, data distribution shifts, and performance degradation over time — requiring continuous monitoring, automated retraining, and sophisticated deployment strategies. The field now encompasses LLMOps and AgentOps alongside classical MLOps, covering infrastructure optimization, observability tooling, guardrails, and governance frameworks that ensure models deliver reliable, cost-effective predictions at scale while maintaining fairness, explainability, and compliance with evolving regulations such as the EU AI Act.
What This Cheat Sheet Covers
This topic spans 17 focused tables and 137 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Deployment Strategies
| Strategy | Example | Description |
|---|---|---|
traffic_split = {"variant_1": 0.95, "variant_2": 0.05} | Gradually routes a small percentage of traffic to the new model, monitors performance, then widens rollout if stable — minimal blast radius during releases. | |
blue_env = current_modelgreen_env = new_modelswitch_traffic(green_env) | Maintains two identical environments and flips traffic instantly — zero downtime and instant rollback to blue if green fails. | |
predictions_prod = model_v1.predict(X)predictions_shadow = model_v2.predict(X) | Runs new model in parallel receiving real traffic but returning no responses — validates behavior offline before promotion. |