Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

πŸ€– Artificial Intelligence
☁️ Cloud and Infrastructure
πŸ’Ύ Data and Databases
πŸ’Ό Professional Skills
🎯 Programming and Development
πŸ”’ Security and Networking
πŸ“š Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
Β© 2026 CheatGridβ„’. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Llama Models (Meta) Cheat Sheet

Llama Models (Meta) Cheat Sheet

Back to Generative AI
Updated 2026-05-21
Next Topic: LlamaIndex Cheat Sheet

Meta's Llama (Large Language Model Meta AI) is a family of open-weight large language models that has evolved from a research-only release in 2023 into one of the most widely deployed model families in the world. Llama models range from compact 1B-parameter edge models to massive mixture-of-experts architectures exceeding 400B total parameters, enabling deployment on a single smartphone all the way to multi-GPU server clusters. What makes the family distinctive is open weights under a commercial-friendly community license, allowing developers to fine-tune, self-host, and build products without vendor lock-in. Understanding the family requires tracking several parallel axes at once: model generation (3.1, 3.2, 3.3, 4), size tier, modality (text-only vs. vision), and variant type (base vs. instruct) β€” each combination has distinct capabilities, prompt formats, and deployment requirements.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 98 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Model Generations and Release TimelineTable 2: Model Sizes and Parameter TiersTable 3: Base vs. Instruct VariantsTable 4: Context WindowsTable 5: Architecture and Technical DesignTable 6: Llama 3.x Prompt Format and Special TokensTable 7: Llama 4 Prompt Format and Special TokensTable 8: Vision and Multimodal CapabilitiesTable 9: Code Llama FamilyTable 10: Llama Guard β€” Safety ClassificationTable 11: Purple Llama Safety SuiteTable 12: Hosting and Deployment OptionsTable 13: Quantization FormatsTable 14: Fine-Tuning MethodsTable 15: License and Commercial Terms

Table 1: Model Generations and Release Timeline

Each Llama generation introduced a major architectural or capability leap. Knowing which generation a model belongs to immediately signals its context window, multimodal support, and license terms.

ModelExampleDescription
Llama 3 (April 2024)
meta-llama/Meta-Llama-3-8B-Instruct
8B and 70B dense decoder-only transformers; 128K-token vocabulary (up from 32K in Llama 2); 8K context; trained on ~15T tokens; GQA across all sizes; strong reasoning and code.
Llama 3.1 (July 2024)
meta-llama/Llama-3.1-405B-Instruct
Adds 405B, expands context to 128K tokens, multilingual support (8 languages), native tool/function calling; 405B intended as teacher model for distillation; same dense architecture as Llama 3.
Llama 3.2 (September 2024)
meta-llama/Llama-3.2-11B-Vision-Instruct
Adds 1B and 3B lightweight edge models + 11B and 90B vision models; first multimodal Llama; 128K context; vision models use cross-attention adapter architecture.
Llama 3.3 (December 2024)
meta-llama/Llama-3.3-70B-Instruct
Single-size 70B release matching near-405B performance at 70B compute cost; 128K context; 8-language support; text-only; released Dec 6, 2024.
Llama 4 Scout (April 2025)
meta-llama/Llama-4-Scout-17B-16E-Instruct
First MoE Llama; 17B active / 109B total params; 16 experts; 10M-token context window (iRoPE architecture); natively multimodal; fits on single H100 with INT4.

More in Generative AI

  • Large Language Models (LLMs) Cheat Sheet
  • LlamaIndex Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • ColBERT and Late Interaction Retrieval Cheat Sheet
  • LangSmith Cheat Sheet
  • pgvector for Postgres Vector Search Cheat Sheet
View all 95 topics in Generative AI