Meta's Llama (Large Language Model Meta AI) is a family of open-weight large language models that has evolved from a research-only release in 2023 into one of the most widely deployed model families in the world. Llama models range from compact 1B-parameter edge models to massive mixture-of-experts architectures exceeding 400B total parameters, enabling deployment on a single smartphone all the way to multi-GPU server clusters. What makes the family distinctive is open weights under a commercial-friendly community license, allowing developers to fine-tune, self-host, and build products without vendor lock-in. Understanding the family requires tracking several parallel axes at once: model generation (3.1, 3.2, 3.3, 4), size tier, modality (text-only vs. vision), and variant type (base vs. instruct) β each combination has distinct capabilities, prompt formats, and deployment requirements.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 98 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Model Generations and Release Timeline
Each Llama generation introduced a major architectural or capability leap. Knowing which generation a model belongs to immediately signals its context window, multimodal support, and license terms.
| Model | Example | Description |
|---|---|---|
meta-llama/Meta-Llama-3-8B-Instruct | 8B and 70B dense decoder-only transformers; 128K-token vocabulary (up from 32K in Llama 2); 8K context; trained on ~15T tokens; GQA across all sizes; strong reasoning and code. | |
meta-llama/Llama-3.1-405B-Instruct | Adds 405B, expands context to 128K tokens, multilingual support (8 languages), native tool/function calling; 405B intended as teacher model for distillation; same dense architecture as Llama 3. | |
meta-llama/Llama-3.2-11B-Vision-Instruct | Adds 1B and 3B lightweight edge models + 11B and 90B vision models; first multimodal Llama; 128K context; vision models use cross-attention adapter architecture. | |
meta-llama/Llama-3.3-70B-Instruct | Single-size 70B release matching near-405B performance at 70B compute cost; 128K context; 8-language support; text-only; released Dec 6, 2024. | |
meta-llama/Llama-4-Scout-17B-16E-Instruct | First MoE Llama; 17B active / 109B total params; 16 experts; 10M-token context window (iRoPE architecture); natively multimodal; fits on single H100 with INT4. |