Hugging Face Transformers Cheat Sheet

Updated 2026-04-28

Next Topic: In-context Learning Cheat Sheet

Hugging Face Transformers is a Python library that provides unified access to thousands of pretrained transformer models across natural language processing, computer vision, audio, and multimodal tasks. With the release of Transformers v5, the library went PyTorch-only, made quantization a first-class feature, and introduced a modular AttentionInterface plus a built-in transformers serve command for OpenAI-compatible inference. The Auto classes intelligently detect model architectures, while the Pipeline API and TRL post-training library give practitioners everything from instant inference to full RLHF alignment workflows.

What This Cheat Sheet Covers

This topic spans 19 focused tables and 188 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Loading and Auto ClassesTable 2: Pipeline API for InferenceTable 3: TokenizationTable 4: Model Training with TrainerTable 5: Fine-Tuning TechniquesTable 6: TRL Post-TrainingTable 7: Chat TemplatesTable 8: Model ArchitecturesTable 9: Datasets IntegrationTable 10: Text Generation ParametersTable 11: Model Saving and SharingTable 12: Optimization and QuantizationTable 13: Accelerate for Distributed TrainingTable 14: Callbacks and LoggingTable 15: Model Evaluation and MetricsTable 16: Advanced Tokenization FeaturesTable 17: Model Hub and Repository ManagementTable 18: Deployment and InferenceTable 19: Common Tasks and Use Cases

Table 1: Core Loading and Auto Classes

The Auto classes are the front door to the whole library — instead of importing a specific model class, you hand from_pretrained a checkpoint name and the right tokenizer, config, and architecture are resolved for you. Which AutoModelFor* you pick declares the task head you want bolted on, from causal generation to token classification, and the same pattern extends to multimodal processors and GGUF-quantized weights.

Class	Example	Description
AutoTokenizer	`from transformers import AutoTokenizer` `tokenizer = AutoTokenizer.from_pretrained("gpt2")`	• Automatically selects and loads the correct tokenizer for a given model • handles text-to-token conversion with model-specific vocabulary.
AutoModel	`from transformers import AutoModel` `model = AutoModel.from_pretrained("bert-base-uncased")`	• Loads the base model architecture from a checkpoint • returns raw hidden states without task-specific heads.
AutoModelForCausalLM	`from transformers import AutoModelForCausalLM` `model = AutoModelForCausalLM.from_pretrained("gpt2")`	• Loads model for causal (left-to-right) language modeling • predicts next token given previous context, used for text generation.
AutoModelForSequenceClassification	`model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)`	• Loads model with a classification head for labeling entire sequences • commonly used for sentiment analysis or topic classification.
AutoProcessor	`from transformers import AutoProcessor` `processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")`	• Loads the unified processor for multimodal models • handles combined image + text inputs in a single interface, first-class in v5.
AutoConfig	`from transformers import AutoConfig` `config = AutoConfig.from_pretrained("t5-small")`	• Loads model configuration settings (hidden size, layers, attention heads) • enables architecture inspection without loading weights.

Table 1: Core Loading and Auto Classes

Class	Example	Description
AutoTokenizer	`from transformers import AutoTokenizer` `tokenizer = AutoTokenizer.from_pretrained("gpt2")`	• Automatically selects and loads the correct tokenizer for a given model • handles text-to-token conversion with model-specific vocabulary.
AutoModel	`from transformers import AutoModel` `model = AutoModel.from_pretrained("bert-base-uncased")`	• Loads the base model architecture from a checkpoint • returns raw hidden states without task-specific heads.
AutoModelForCausalLM	`from transformers import AutoModelForCausalLM` `model = AutoModelForCausalLM.from_pretrained("gpt2")`	• Loads model for causal (left-to-right) language modeling • predicts next token given previous context, used for text generation.
AutoModelForSequenceClassification	`model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)`	• Loads model with a classification head for labeling entire sequences • commonly used for sentiment analysis or topic classification.
AutoProcessor	`from transformers import AutoProcessor` `processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")`	• Loads the unified processor for multimodal models • handles combined image + text inputs in a single interface, first-class in v5.
AutoConfig	`from transformers import AutoConfig` `config = AutoConfig.from_pretrained("t5-small")`	• Loads model configuration settings (hidden size, layers, attention heads) • enables architecture inspection without loading weights.