Meta's Llama (Large Language Model Meta AI) is a family of open-weight large language models that has evolved from a research-only release in 2023 into one of the most widely deployed model families in the world. Llama models range from compact 1B-parameter edge models to massive mixture-of-experts architectures exceeding 400B total parameters, enabling deployment on a single smartphone all the way to multi-GPU server clusters. What makes the family distinctive is open weights under a commercial-friendly community license, allowing developers to fine-tune, self-host, and build products without vendor lock-in. Understanding the family requires tracking several parallel axes at once: model generation (3.1, 3.2, 3.3, 4), size tier, modality (text-only vs. vision), and variant type (base vs. instruct) — each combination has distinct capabilities, prompt formats, and deployment requirements.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 98 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Model Generations and Release Timeline
Each Llama generation introduced a major architectural or capability leap. Knowing which generation a model belongs to immediately signals its context window, multimodal support, and license terms.
| Model | Example | Description |
|---|---|---|
meta-llama/Meta-Llama-3-8B-Instruct | • 8B and 70B dense decoder-only transformers • 128K-token vocabulary (up from 32K in Llama 2) • 8K context • trained on ~15T tokens • GQA across all sizes • strong reasoning and code | |
meta-llama/Llama-3.1-405B-Instruct | • Adds 405B, expands context to 128K tokens, multilingual support (8 languages), native tool/function calling • 405B intended as teacher model for distillation • same dense architecture as Llama 3. | |
meta-llama/Llama-3.2-11B-Vision-Instruct | • Adds 1B and 3B lightweight edge models + 11B and 90B vision models • first multimodal Llama • 128K context • vision models use cross-attention adapter architecture | |
meta-llama/Llama-3.3-70B-Instruct | • Single-size 70B release matching near-405B performance at 70B compute cost • 128K context • 8-language support • text-only • released Dec 6, 2024. | |
meta-llama/Llama-4-Scout-17B-16E-Instruct | • First MoE Llama • 17B active / 109B total params • 16 experts • 10M-token context window (iRoPE architecture) • natively multimodal • fits on single H100 with INT4. |