MLflow Cheat Sheet

Updated 2026-05-21

MLflow is an open-source platform for managing the full machine learning lifecycle — from experiment tracking and model packaging to registry management and production deployment. It sits at the center of modern MLOps workflows, giving data scientists a unified interface regardless of which framework (scikit-learn, PyTorch, TensorFlow, XGBoost, or LLMs) they use. The key mental model: every piece of work in MLflow is a run inside an experiment, every run can log params, metrics, and artifacts, and every trained artifact can be promoted to a versioned Model Registry — making any team's experiments reproducible, comparable, and deployable with minimal overhead.

What This Cheat Sheet Covers

This topic spans 20 focused tables and 142 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Tracking API — Runs and ExperimentsTable 2: Artifact LoggingTable 3: Dataset Tracking (mlflow.data)Table 4: AutologgingTable 5: Model Flavors and LoggingTable 6: Model Signatures and Input ExamplesTable 7: Custom pyfunc ModelsTable 8: Model RegistryTable 9: Search and Query APITable 10: Nested Runs and Hyperparameter TuningTable 11: System Metrics LoggingTable 12: Model Evaluation (mlflow.evaluate)Table 13: GenAI / LLM Tracking and TracingTable 14: Prompt RegistryTable 15: LLM / GenAI EvaluationTable 16: Model Serving and DeploymentTable 17: MLflow ProjectsTable 18: MLflow RecipesTable 19: Tracking Server ArchitectureTable 20: AI Gateway

Table 1: Core Tracking API — Runs and Experiments

The tracking API is the foundation of everything in MLflow. Understanding how to create experiments, start runs, and log data programmatically is the entry point before using autologging, the UI, or any advanced feature.

Function	Example	Description
mlflow.set_experiment()	`mlflow.set_experiment("fraud-detection")`	• Sets the active experiment by name • creates it if it doesn't exist • All subsequent runs go into this experiment
mlflow.start_run()	`with mlflow.start_run(run_name="v1") as run:` `mlflow.log_param("lr", 0.01)`	• Opens a new tracking run as a context manager • automatically calls `end_run()` on exit, even if an exception is raised • Preferred over manual `end_run()`.
mlflow.log_param()	`mlflow.log_param("n_estimators", 100)`	Logs a single key-value hyperparameter (both as strings). Call once per param per run — duplicate keys are overwritten.
mlflow.log_params()	`mlflow.log_params({"lr": 0.01, "epochs": 10})`	• Logs multiple parameters at once from a dict • equivalent to calling `log_param()` in a loop but more efficient
mlflow.log_metric()	`mlflow.log_metric("accuracy", 0.94, step=5)`	• Logs a numeric metric • optional `step` enables time-series charting in the UI • MLflow keeps the full history of values
mlflow.log_metrics()	`mlflow.log_metrics({"mae": 0.12, "rmse": 0.34})`	• Logs multiple metrics simultaneously • accepts optional `step` and `timestamp`.
mlflow.set_tag()	`mlflow.set_tag("team", "nlp-squad")`	• Attaches a string metadata tag to the run • tags appear in the UI and are searchable with `ILIKE`.

Table 1: Core Tracking API — Runs and Experiments

Function	Example	Description
mlflow.set_experiment()	`mlflow.set_experiment("fraud-detection")`	• Sets the active experiment by name • creates it if it doesn't exist • All subsequent runs go into this experiment
mlflow.start_run()	`with mlflow.start_run(run_name="v1") as run:` `mlflow.log_param("lr", 0.01)`	• Opens a new tracking run as a context manager • automatically calls `end_run()` on exit, even if an exception is raised • Preferred over manual `end_run()`.
mlflow.log_param()	`mlflow.log_param("n_estimators", 100)`	Logs a single key-value hyperparameter (both as strings). Call once per param per run — duplicate keys are overwritten.
mlflow.log_params()	`mlflow.log_params({"lr": 0.01, "epochs": 10})`	• Logs multiple parameters at once from a dict • equivalent to calling `log_param()` in a loop but more efficient
mlflow.log_metric()	`mlflow.log_metric("accuracy", 0.94, step=5)`	• Logs a numeric metric • optional `step` enables time-series charting in the UI • MLflow keeps the full history of values
mlflow.log_metrics()	`mlflow.log_metrics({"mae": 0.12, "rmse": 0.34})`	• Logs multiple metrics simultaneously • accepts optional `step` and `timestamp`.
mlflow.set_tag()	`mlflow.set_tag("team", "nlp-squad")`	• Attaches a string metadata tag to the run • tags appear in the UI and are searchable with `ILIKE`.