Few-shot and zero-shot learning are machine learning paradigms that enable models to generalize to new tasks or classes with minimal labeled examples—ranging from none (zero-shot) to a small handful (few-shot). These approaches are foundational to in-context learning in large language models and meta-learning in computer vision, where models learn to adapt quickly by transferring knowledge from prior experience rather than requiring extensive task-specific training. The key challenge is to design representations, prompting strategies, and meta-learning algorithms that maximize generalization from extremely limited supervision, making these techniques essential for real-world applications where labeled data is scarce, expensive, or rapidly changing. Understanding the nuances between demonstration selection, calibration methods, and architectural choices directly impacts whether a model performs near state-of-the-art or random-guess accuracy on new tasks.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 117 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Learning Paradigms
The foundational vocabulary of few-shot and zero-shot learning defines the problem structure, data splits, and the main training philosophies. Knowing how these paradigms relate helps you choose the right approach — prompting a large LLM vs. training a metric-based model vs. full meta-learning — for your data and compute constraints.
| Paradigm | Example | Description |
|---|---|---|
Classify sentiment: "I loved it!" → Positive | • Model performs a task with no examples, relying entirely on pre-trained knowledge and task instructions • works best for generalized tasks that match training distribution. | |
Example: "Great movie" → PositiveClassify: "Terrible film" | • Model receives exactly one example per class before inference • bridges zero-shot and few-shot by providing minimal demonstration of desired behavior. | |
3 examples:"Loved it" → Pos"Hated it" → Neg"Okay" → Neu | • Model learns from 2–10 labeled examples per class • significantly improves performance over zero-shot for specialized or nuanced tasks. | |
k=5: 5 examples per class | • Formalization of few-shot where k specifies the exact number of examples per class • typically k ∈ {1, 2, 5, 10} in research benchmarks. | |
5-way 1-shot: 5 classes, 1 example each | • Standard task formulation where model discriminates between N classes given K examples per class • defines the few-shot classification problem structure. |