NVIDIA Triton Inference Server Cheat Sheet

Updated 2026-05-21

Next Topic: Object Detection Models (YOLO, Faster R-CNN, DETR) Cheat Sheet

NVIDIA Triton Inference Server is an open-source production inference serving platform that can deploy AI models from virtually any framework — TensorRT, ONNX Runtime, PyTorch, TensorFlow, Python, vLLM, and more — through a unified, standardized API. It sits at the center of GPU-accelerated inference stacks, offering dynamic batching, concurrent multi-model execution, and deep observability through Prometheus metrics, making it the production standard for organizations running AI at scale. The key mental model to internalize is that Triton is a model management and scheduling layer, not an inference engine itself — the actual execution happens inside pluggable backends, and Triton orchestrates when, how, and on which hardware those backends run.

What This Cheat Sheet Covers

This topic spans 19 focused tables and 175 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Model Repository Directory StructureTable 2: Core config.pbtxt FieldsTable 3: Version PolicyTable 4: Dynamic BatchingTable 5: Sequence Batching (Stateful Models)Table 6: Instance Groups and GPU Resource SharingTable 7: Ensemble ModelsTable 8: BackendsTable 9: Optimization SettingsTable 10: Model WarmupTable 11: Response CacheTable 12: HTTP/REST and gRPC API (KServe v2 / Open Inference Protocol)Table 13: Prometheus MetricsTable 14: Model Management and Server Startup FlagsTable 15: Python Backend (TritonPythonModel)Table 16: Business Logic Scripting (BLS)Table 17: Performance Analyzer (perf_analyzer)Table 18: Model AnalyzerTable 19: Client Libraries (tritonclient)

Table 1: Model Repository Directory Structure

Every model Triton can serve must live inside a model repository with a strict hierarchical layout. Understanding this structure is the first step before any configuration — Triton reads the repository at startup (or on demand) and derives all serving behavior from the directory tree and config files found within it.

Field	Example	Description
Model repository root	`tritonserver --model-repository=/models`	• Top-level directory passed at launch • all model subdirectories live here
Model directory	`/models/resnet50/`	• One directory per model • directory name becomes the model name served by Triton
Version subdirectory	`/models/resnet50/1/`	• Numeric subdirectories hold actual model files • non-numeric names and names starting with `0` are ignored
config.pbtxt	`/models/resnet50/config.pbtxt`	• Optional ModelConfig protobuf text file • required for some backends, auto-generated for others
Model file (TensorRT)	`/models/resnet50/1/model.plan`	• Default TensorRT model filename • overridable via `default_model_filename` in config
Model file (ONNX)	`/models/classifier/1/model.onnx`	• Default ONNX model filename • multi-file ONNX models go in a directory with `model.onnx` as the main file
Model file (PyTorch)	`/models/bert/1/model.pt`	Default TorchScript filename for the PyTorch backend.

Table 1: Model Repository Directory Structure

Field	Example	Description
Model repository root	`tritonserver --model-repository=/models`	• Top-level directory passed at launch • all model subdirectories live here
Model directory	`/models/resnet50/`	• One directory per model • directory name becomes the model name served by Triton
Version subdirectory	`/models/resnet50/1/`	• Numeric subdirectories hold actual model files • non-numeric names and names starting with `0` are ignored
config.pbtxt	`/models/resnet50/config.pbtxt`	• Optional ModelConfig protobuf text file • required for some backends, auto-generated for others
Model file (TensorRT)	`/models/resnet50/1/model.plan`	• Default TensorRT model filename • overridable via `default_model_filename` in config
Model file (ONNX)	`/models/classifier/1/model.onnx`	• Default ONNX model filename • multi-file ONNX models go in a directory with `model.onnx` as the main file
Model file (PyTorch)	`/models/bert/1/model.pt`	Default TorchScript filename for the PyTorch backend.