Mistral AI is a French company that has rapidly built one of the most diverse open and proprietary model families in generative AI, spanning dense transformers, sparse Mixture-of-Experts (MoE) architectures, code specialists, vision-language models, reasoning models, and edge-optimized small language models. Practitioners choose Mistral because many of its most capable models are Apache 2.0 licensed β deployable on your own infrastructure with no vendor lock-in β while a curated set of premier models is available via La Plateforme, the company's unified API and developer console. The key mental model: Mistral versions its models with a YY.MM suffix (e.g., mistral-large-2512), always exposes a -latest alias for the current recommended version, and separates "Open" models (self-hostable) from "Premier" models (API-only) β knowing this distinction upfront prevents confusion when reading the docs.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 116 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Model Family Overview
Mistral's model catalog spans six major families, each optimized for a distinct workload. Understanding where each family sits β from frontier general-purpose models down to sub-10B edge deployments β is the prerequisite for choosing the right model ID in any API call.
| Model | Example | Description |
|---|---|---|
model="mistral-large-latest" | Flagship open-weight general-purpose model (v25.12); 675B total parameters, ~41B active; 262K context; multimodal text + vision; Apache 2.0. | |
model="mistral-medium-2504" | 128B dense frontier model for agents and coding (v26.04); 256K context; 77.6% SWE-Bench Verified; multimodal; supports configurable reasoning effort. | |
model="mistral-small-latest" | 119B MoE, 6B active; hybrid instruct + reasoning + coding + vision in one model (v26.03); 262K context; Apache 2.0; replaces three separate models. | |
model="open-mixtral-8x22b" | Sparse MoE: 141B total params, 39B active; 64K context; strongest open MoE; superior multilingual and coding benchmarks. | |
model="open-mixtral-8x7b" | Sparse MoE: 46.7B total params, 12.9B active; 32K context; cost/quality sweet spot for open deployments; Apache 2.0. | |
model="magistral-medium-latest" | Reasoning-first premier model; trained with RL alone (no distillation); 73.6% AIME2024; 128K context; chain-of-thought traces always exposed. | |
model="magistral-small-latest" | 24B open reasoning model; 70.7% AIME2024; runs on a single RTX 4090 (quantized); Apache 2.0. |