Kubeflow Cheat Sheet

Updated 2026-05-21

Kubeflow is an open-source, cloud-native MLOps platform built on Kubernetes that orchestrates the entire machine learning lifecycle — from interactive development in notebooks to distributed training, hyperparameter tuning, and production model serving. It addresses the core challenge of reproducibly moving ML workloads from a data scientist's laptop to scalable, multi-tenant infrastructure without rewriting pipelines. The critical mental model to carry through this cheat sheet is that almost everything in Kubeflow is a Kubernetes Custom Resource — InferenceService, PyTorchJob, Experiment, Notebook — so standard kubectl tooling, RBAC, and Kubernetes-native observability all apply directly.

What This Cheat Sheet Covers

This topic spans 19 focused tables and 141 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Kubeflow Core Components OverviewTable 2: KFP v2 Component AuthoringTable 3: KFP v2 Artifact TypesTable 4: KFP v2 Pipeline Authoring and CompilationTable 5: KFP v2 Control FlowTable 6: KFP v2 Task Configuration and CachingTable 7: KFP v2 Pipeline Submission and Recurring RunsTable 8: Kubeflow Training Operator — Job TypesTable 9: Kubeflow Trainer v2 — TrainJob API and Distributed StrategiesTable 10: Katib — Hyperparameter Tuning and NASTable 11: Katib — Metrics Collection and Early StoppingTable 12: KServe — InferenceService Core ConceptsTable 13: KServe — Serving Runtimes and ProtocolsTable 14: KServe — Canary Rollout and Traffic ManagementTable 15: Kubeflow Central Dashboard, Profiles, and Multi-TenancyTable 16: Kubeflow MLflow Integration and Experiment TrackingTable 17: Kubeflow Deployment — Cloud Distributions and InstallationTable 18: Kubeflow Notebooks — Configuration and LifecycleTable 19: Kubeflow Model Registry

Table 1: Kubeflow Core Components Overview

Kubeflow is a suite of composable components rather than a monolithic framework; knowing which component does what prevents the confusion of treating it as a single tool. Each component addresses a distinct phase of the ML lifecycle and can be installed and used independently.

Component	Example	Description
Kubeflow Pipelines (KFP)	`@dsl.pipeline` + `Compiler().compile()`	• ML workflow orchestration — defines, compiles, and executes multi-step ML pipelines as Kubernetes pods • uses Argo Workflows as its execution engine
Kubeflow Notebooks	JupyterLab / VS Code / RStudio on K8s pod	Spawns interactive IDE containers (JupyterLab, VS Code via code-server, RStudio) as Kubernetes pods inside a user's profile namespace
Kubeflow Trainer (Training Operator)	`PyTorchJob`, `TFJob`, `MPIJob` CRDs	Distributed training — Kubernetes operators that orchestrate multi-node, multi-GPU training jobs across PyTorch, TensorFlow, MPI, XGBoost, JAX, and more
KServe	`InferenceService` CRD	Model serving — standardized inference platform supporting serverless autoscaling, canary rollouts, multi-framework runtimes, and OpenAI-compatible APIs

Table 1: Kubeflow Core Components Overview

Component	Example	Description
Kubeflow Pipelines (KFP)	`@dsl.pipeline` + `Compiler().compile()`	• ML workflow orchestration — defines, compiles, and executes multi-step ML pipelines as Kubernetes pods • uses Argo Workflows as its execution engine
Kubeflow Notebooks	JupyterLab / VS Code / RStudio on K8s pod	Spawns interactive IDE containers (JupyterLab, VS Code via code-server, RStudio) as Kubernetes pods inside a user's profile namespace
Kubeflow Trainer (Training Operator)	`PyTorchJob`, `TFJob`, `MPIJob` CRDs	Distributed training — Kubernetes operators that orchestrate multi-node, multi-GPU training jobs across PyTorch, TensorFlow, MPI, XGBoost, JAX, and more
KServe	`InferenceService` CRD	Model serving — standardized inference platform supporting serverless autoscaling, canary rollouts, multi-framework runtimes, and OpenAI-compatible APIs