AI in Production Cheat Sheet

Updated 2026-04-28

Next Topic: AI Model Deployment Cheat Sheet

🧠Study flashcards on this topic125 cards · spaced repetition→

AI in Production refers to the operational deployment, scaling, and management of machine learning models beyond experimental environments. Unlike traditional software, production ML systems face unique challenges including model drift, data distribution shifts, and performance degradation over time — requiring continuous monitoring, automated retraining, and sophisticated deployment strategies. The field now encompasses LLMOps and AgentOps alongside classical MLOps, covering infrastructure optimization, observability tooling, guardrails, and governance frameworks that ensure models deliver reliable, cost-effective predictions at scale while maintaining fairness, explainability, and compliance with evolving regulations such as the EU AI Act.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 137 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Deployment StrategiesTable 2: Model Serving PatternsTable 3: Scaling and OptimizationTable 4: Monitoring and ObservabilityTable 5: Drift and Quality ManagementTable 6: Retraining and AutomationTable 7: Model Versioning and RegistryTable 8: Feature Engineering for ProductionTable 9: Infrastructure and OrchestrationTable 10: Testing and ValidationTable 11: Explainability and GovernanceTable 12: Performance OptimizationTable 13: Cost ManagementTable 14: Reliability and SLA ManagementTable 15: LLMOps and Generative AI ProductionTable 16: AI Agent Operations (AgentOps)Table 17: Advanced Topics

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Deployment Strategies

Shipping a new model is the moment everything can go wrong, so the art is controlling the blast radius. Each of these strategies trades off speed, safety, and cost differently—canary and rolling deployments leak traffic to the new version gradually, blue-green flips instantly with one-click rollback, and shadow or champion-challenger setups let you validate behavior on real traffic before anyone depends on it.

Strategy	Example	Description
Canary Deployment	`traffic_split = {"variant_1": 0.95, "variant_2": 0.05}`	Gradually routes a small percentage of traffic to the new model, monitors performance, then widens rollout if stable — minimal blast radius during releases.
Blue-Green Deployment	`blue_env = current_model` `green_env = new_model` `switch_traffic(green_env)`	Maintains two identical environments and flips traffic instantly — zero downtime and instant rollback to blue if green fails.
Shadow Deployment	`predictions_prod = model_v1.predict(X)` `predictions_shadow = model_v2.predict(X)`	Runs new model in parallel receiving real traffic but returning no responses — validates behavior offline before promotion.

Table 1: Deployment Strategies

Strategy	Example	Description
Canary Deployment	`traffic_split = {"variant_1": 0.95, "variant_2": 0.05}`	Gradually routes a small percentage of traffic to the new model, monitors performance, then widens rollout if stable — minimal blast radius during releases.
Blue-Green Deployment	`blue_env = current_model` `green_env = new_model` `switch_traffic(green_env)`	Maintains two identical environments and flips traffic instantly — zero downtime and instant rollback to blue if green fails.
Shadow Deployment	`predictions_prod = model_v1.predict(X)` `predictions_shadow = model_v2.predict(X)`	Runs new model in parallel receiving real traffic but returning no responses — validates behavior offline before promotion.