Mistral AI Models Cheat Sheet

Updated 2026-05-21

Next Topic: Model Quantization Cheat Sheet

Mistral AI is a French company that has rapidly built one of the most diverse open and proprietary model families in generative AI, spanning dense transformers, sparse Mixture-of-Experts (MoE) architectures, code specialists, vision-language models, reasoning models, and edge-optimized small language models. Practitioners choose Mistral because many of its most capable models are Apache 2.0 licensed — deployable on your own infrastructure with no vendor lock-in — while a curated set of premier models is available via La Plateforme, the company's unified API and developer console. The key mental model: Mistral versions its models with a YY.MM suffix (e.g., mistral-large-2512), always exposes a -latest alias for the current recommended version, and separates "Open" models (self-hostable) from "Premier" models (API-only) — knowing this distinction upfront prevents confusion when reading the docs.

What This Cheat Sheet Covers

This topic spans 16 focused tables and 116 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Model Family OverviewTable 2: Core Architecture TechniquesTable 3: La Plateforme API — EndpointsTable 4: Chat Completion ParametersTable 5: Function CallingTable 6: Structured OutputTable 7: Vision and MultimodalTable 8: Code Generation and FIMTable 9: EmbeddingsTable 10: Reasoning (Magistral)Table 11: Fine-TuningTable 12: Batch ProcessingTable 13: Python SDKTable 14: Moderation and SafetyTable 15: Third-Party HostingTable 16: Le Chat and Vibe

Table 1: Model Family Overview

Mistral's model catalog spans six major families, each optimized for a distinct workload. Understanding where each family sits — from frontier general-purpose models down to sub-10B edge deployments — is the prerequisite for choosing the right model ID in any API call.

Model	Example	Description
Mistral Large 3	`model="mistral-large-latest"`	• Flagship open-weight general-purpose model (v25.12) • 675B total parameters, ~41B active • 262K context • multimodal text + vision • Apache 2.0.
Mistral Medium 3.5	`model="mistral-medium-2504"`	• 128B dense frontier model for agents and coding (v26.04) • 256K context • 77.6% SWE-Bench Verified • multimodal • supports configurable reasoning effort
Mistral Small 4	`model="mistral-small-latest"`	• 119B MoE, 6B active • hybrid instruct + reasoning + coding + vision in one model (v26.03) • 262K context • Apache 2.0 • replaces three separate models
Mixtral 8x22B	`model="open-mixtral-8x22b"`	• Sparse MoE: 141B total params, 39B active • 64K context • strongest open MoE • superior multilingual and coding benchmarks
Mixtral 8x7B	`model="open-mixtral-8x7b"`	• Sparse MoE: 46.7B total params, 12.9B active • 32K context • cost/quality sweet spot for open deployments • Apache 2.0.
Magistral Medium 1.2	`model="magistral-medium-latest"`	• Reasoning-first premier model • trained with RL alone (no distillation) • 73.6% AIME2024 • 128K context • chain-of-thought traces always exposed
Magistral Small	`model="magistral-small-latest"`	• 24B open reasoning model • 70.7% AIME2024 • runs on a single RTX 4090 (quantized) • Apache 2.0.

Table 1: Model Family Overview

Model	Example	Description
Mistral Large 3	`model="mistral-large-latest"`	• Flagship open-weight general-purpose model (v25.12) • 675B total parameters, ~41B active • 262K context • multimodal text + vision • Apache 2.0.
Mistral Medium 3.5	`model="mistral-medium-2504"`	• 128B dense frontier model for agents and coding (v26.04) • 256K context • 77.6% SWE-Bench Verified • multimodal • supports configurable reasoning effort
Mistral Small 4	`model="mistral-small-latest"`	• 119B MoE, 6B active • hybrid instruct + reasoning + coding + vision in one model (v26.03) • 262K context • Apache 2.0 • replaces three separate models
Mixtral 8x22B	`model="open-mixtral-8x22b"`	• Sparse MoE: 141B total params, 39B active • 64K context • strongest open MoE • superior multilingual and coding benchmarks
Mixtral 8x7B	`model="open-mixtral-8x7b"`	• Sparse MoE: 46.7B total params, 12.9B active • 32K context • cost/quality sweet spot for open deployments • Apache 2.0.
Magistral Medium 1.2	`model="magistral-medium-latest"`	• Reasoning-first premier model • trained with RL alone (no distillation) • 73.6% AIME2024 • 128K context • chain-of-thought traces always exposed
Magistral Small	`model="magistral-small-latest"`	• 24B open reasoning model • 70.7% AIME2024 • runs on a single RTX 4090 (quantized) • Apache 2.0.