Mistral AI is a French company that has rapidly built one of the most diverse open and proprietary model families in generative AI, spanning dense transformers, sparse Mixture-of-Experts (MoE) architectures, code specialists, vision-language models, reasoning models, and edge-optimized small language models. Practitioners choose Mistral because many of its most capable models are Apache 2.0 licensed — deployable on your own infrastructure with no vendor lock-in — while a curated set of premier models is available via La Plateforme, the company's unified API and developer console. The key mental model: Mistral versions its models with a YY.MM suffix (e.g., mistral-large-2512), always exposes a -latest alias for the current recommended version, and separates "Open" models (self-hostable) from "Premier" models (API-only) — knowing this distinction upfront prevents confusion when reading the docs.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 116 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Model Family Overview
Mistral's model catalog spans six major families, each optimized for a distinct workload. Understanding where each family sits — from frontier general-purpose models down to sub-10B edge deployments — is the prerequisite for choosing the right model ID in any API call.
| Model | Example | Description |
|---|---|---|
model="mistral-large-latest" | • Flagship open-weight general-purpose model (v25.12) • 675B total parameters, ~41B active • 262K context • multimodal text + vision • Apache 2.0. | |
model="mistral-medium-2504" | • 128B dense frontier model for agents and coding (v26.04) • 256K context • 77.6% SWE-Bench Verified • multimodal • supports configurable reasoning effort | |
model="mistral-small-latest" | • 119B MoE, 6B active • hybrid instruct + reasoning + coding + vision in one model (v26.03) • 262K context • Apache 2.0 • replaces three separate models | |
model="open-mixtral-8x22b" | • Sparse MoE: 141B total params, 39B active • 64K context • strongest open MoE • superior multilingual and coding benchmarks | |
model="open-mixtral-8x7b" | • Sparse MoE: 46.7B total params, 12.9B active • 32K context • cost/quality sweet spot for open deployments • Apache 2.0. | |
model="magistral-medium-latest" | • Reasoning-first premier model • trained with RL alone (no distillation) • 73.6% AIME2024 • 128K context • chain-of-thought traces always exposed | |
model="magistral-small-latest" | • 24B open reasoning model • 70.7% AIME2024 • runs on a single RTX 4090 (quantized) • Apache 2.0. |