NVIDIA Triton Inference Server is an open-source production inference serving platform that can deploy AI models from virtually any framework β TensorRT, ONNX Runtime, PyTorch, TensorFlow, Python, vLLM, and more β through a unified, standardized API. It sits at the center of GPU-accelerated inference stacks, offering dynamic batching, concurrent multi-model execution, and deep observability through Prometheus metrics, making it the production standard for organizations running AI at scale. The key mental model to internalize is that Triton is a model management and scheduling layer, not an inference engine itself β the actual execution happens inside pluggable backends, and Triton orchestrates when, how, and on which hardware those backends run.
What This Cheat Sheet Covers
This topic spans 19 focused tables and 175 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Model Repository Directory Structure
Every model Triton can serve must live inside a model repository with a strict hierarchical layout. Understanding this structure is the first step before any configuration β Triton reads the repository at startup (or on demand) and derives all serving behavior from the directory tree and config files found within it.
| Field | Example | Description |
|---|---|---|
tritonserver --model-repository=/models | Top-level directory passed at launch; all model subdirectories live here. | |
/models/resnet50/ | One directory per model; directory name becomes the model name served by Triton. | |
/models/resnet50/1/ | Numeric subdirectories hold actual model files; non-numeric names and names starting with 0 are ignored. | |
/models/resnet50/config.pbtxt | Optional ModelConfig protobuf text file; required for some backends, auto-generated for others. | |
/models/resnet50/1/model.plan | Default TensorRT model filename; overridable via default_model_filename in config. | |
/models/classifier/1/model.onnx | Default ONNX model filename; multi-file ONNX models go in a directory with model.onnx as the main file. | |
/models/bert/1/model.pt | Default TorchScript filename for the PyTorch backend. |