Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Llama Models (Meta) Cheat Sheet

Llama Models (Meta) Cheat Sheet

Back to Generative AI
Updated 2026-05-21
Next Topic: LlamaIndex Cheat Sheet

Meta's Llama (Large Language Model Meta AI) is a family of open-weight large language models that has evolved from a research-only release in 2023 into one of the most widely deployed model families in the world. Llama models range from compact 1B-parameter edge models to massive mixture-of-experts architectures exceeding 400B total parameters, enabling deployment on a single smartphone all the way to multi-GPU server clusters. What makes the family distinctive is open weights under a commercial-friendly community license, allowing developers to fine-tune, self-host, and build products without vendor lock-in. Understanding the family requires tracking several parallel axes at once: model generation (3.1, 3.2, 3.3, 4), size tier, modality (text-only vs. vision), and variant type (base vs. instruct) — each combination has distinct capabilities, prompt formats, and deployment requirements.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 98 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Model Generations and Release TimelineTable 2: Model Sizes and Parameter TiersTable 3: Base vs. Instruct VariantsTable 4: Context WindowsTable 5: Architecture and Technical DesignTable 6: Llama 3.x Prompt Format and Special TokensTable 7: Llama 4 Prompt Format and Special TokensTable 8: Vision and Multimodal CapabilitiesTable 9: Code Llama FamilyTable 10: Llama Guard — Safety ClassificationTable 11: Purple Llama Safety SuiteTable 12: Hosting and Deployment OptionsTable 13: Quantization FormatsTable 14: Fine-Tuning MethodsTable 15: License and Commercial Terms

Table 1: Model Generations and Release Timeline

Each Llama generation introduced a major architectural or capability leap. Knowing which generation a model belongs to immediately signals its context window, multimodal support, and license terms.

ModelExampleDescription
Llama 3 (April 2024)
meta-llama/Meta-Llama-3-8B-Instruct
• 8B and 70B dense decoder-only transformers
• 128K-token vocabulary (up from 32K in Llama 2)
• 8K context
• trained on ~15T tokens
• GQA across all sizes
• strong reasoning and code
Llama 3.1 (July 2024)
meta-llama/Llama-3.1-405B-Instruct
• Adds 405B, expands context to 128K tokens, multilingual support (8 languages), native tool/function calling
• 405B intended as teacher model for distillation
• same dense architecture as Llama 3.
Llama 3.2 (September 2024)
meta-llama/Llama-3.2-11B-Vision-Instruct
• Adds 1B and 3B lightweight edge models + 11B and 90B vision models
• first multimodal Llama
• 128K context
• vision models use cross-attention adapter architecture
Llama 3.3 (December 2024)
meta-llama/Llama-3.3-70B-Instruct
• Single-size 70B release matching near-405B performance at 70B compute cost
• 128K context
• 8-language support
• text-only
• released Dec 6, 2024.
Llama 4 Scout (April 2025)
meta-llama/Llama-4-Scout-17B-16E-Instruct
• First MoE Llama
• 17B active / 109B total params
• 16 experts
• 10M-token context window (iRoPE architecture)
• natively multimodal
• fits on single H100 with INT4.

More in Generative AI

  • Large Language Models (LLMs) Cheat Sheet
  • LlamaIndex Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • ColBERT and Late Interaction Retrieval Cheat Sheet
  • LangSmith Cheat Sheet
  • pgvector for Postgres Vector Search Cheat Sheet
View all 95 topics in Generative AI