ONNX and ONNX Runtime Cheat Sheet

Updated 2026-05-21

Next Topic: Optuna Hyperparameter Optimization Cheat Sheet

ONNX (Open Neural Network Exchange) is an open format for representing machine learning models as portable computation graphs, enabling interoperability between frameworks like PyTorch and TensorFlow. ONNX Runtime is Microsoft's high-performance inference engine that executes those graphs across CPUs, GPUs, mobile, and browser runtimes. The core value proposition is a clean separation between training and deployment: export once, run anywhere — on CUDA, TensorRT, DirectML, CoreML, QNN, or plain WebAssembly. The key mental model is that performance is layered: the model graph itself can be optimized (graph fusion, quantization, FP16 conversion), the execution provider determines the hardware backend, and session/run options fine-tune threading and memory — these three levers operate independently and can be combined.

What This Cheat Sheet Covers

This topic spans 19 focused tables and 139 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: ONNX Model Format and Core ConceptsTable 2: ONNX Opsets and Operator VersioningTable 3: Exporting from PyTorchTable 4: Exporting from TensorFlow / Keras (tf2onnx)Table 5: Converting scikit-learn Models (sklearn-onnx / skl2onnx)Table 6: ONNX Model Verification and Shape InferenceTable 7: Building ONNX Models ProgrammaticallyTable 8: ONNX Runtime InferenceSessionTable 9: SessionOptions ConfigurationTable 10: Graph Optimization LevelsTable 11: Execution ProvidersTable 12: Execution Provider Configuration — CUDA and TensorRTTable 13: IOBinding for Zero-Copy InferenceTable 14: QuantizationTable 15: FP16 and Mixed Precision ConversionTable 16: Transformer Model OptimizerTable 17: ONNX Runtime Web (Browser)Table 18: ONNX Runtime Mobile and ORT FormatTable 19: Profiling and Performance Diagnosis

Table 1: ONNX Model Format and Core Concepts

The ONNX format stores models as Protocol Buffer (.proto) files. Understanding how ModelProto, opsets, and domains fit together is essential before exporting or loading any model.

Concept	Example	Description
ModelProto	`model = onnx.load("model.onnx")` `print(model.ir_version)`	Top-level container holding the computation graph, opset imports, model version, and metadata.
GraphProto	`graph = model.graph` `print(len(graph.node))`	• Directed Acyclic Graph of NodeProtos • contains `node`, `input`, `output`, and `initializer` fields
NodeProto	`node = graph.node[0]` `print(node.op_type, node.input, node.output)`	• Represents a single operator invocation • references input/output tensors by string name
TensorProto	`init = graph.initializer[0]` `arr = numpy_helper.to_array(init)`	Stores constant tensor values (weights/biases) embedded directly in the model.
ValueInfoProto	`vi = helper.make_tensor_value_info('X', TensorProto.FLOAT, [None, 3])`	Describes shape and dtype of a graph edge (input, output, or intermediate value).

Table 1: ONNX Model Format and Core Concepts

The ONNX format stores models as Protocol Buffer (.proto) files. Understanding how ModelProto, opsets, and domains fit together is essential before exporting or loading any model.

Concept	Example	Description
ModelProto	`model = onnx.load("model.onnx")` `print(model.ir_version)`	Top-level container holding the computation graph, opset imports, model version, and metadata.
GraphProto	`graph = model.graph` `print(len(graph.node))`	• Directed Acyclic Graph of NodeProtos • contains `node`, `input`, `output`, and `initializer` fields
NodeProto	`node = graph.node[0]` `print(node.op_type, node.input, node.output)`	• Represents a single operator invocation • references input/output tensors by string name
TensorProto	`init = graph.initializer[0]` `arr = numpy_helper.to_array(init)`	Stores constant tensor values (weights/biases) embedded directly in the model.
ValueInfoProto	`vi = helper.make_tensor_value_info('X', TensorProto.FLOAT, [None, 3])`	Describes shape and dtype of a graph edge (input, output, or intermediate value).