DeepSeek (by DeepSeek-AI) and Qwen (by Alibaba Cloud) are the two most prominent open-weight large language model families from Chinese AI labs, collectively defining the frontier of non-proprietary AI in 2025–2026. Both families employ Mixture-of-Experts (MoE) architectures that activate only a fraction of total parameters per token, enabling frontier-level performance at dramatically reduced inference cost. The key insight for practitioners: these models are not monolithic — each family spans general-purpose, reasoning-specialized, code-specialized, and multimodal variants, all with different prompt formatting, context windows, and licensing terms, so choosing correctly requires understanding the full taxonomy.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 103 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: DeepSeek Model Family Overview
The DeepSeek family spans five distinct model lines, each optimized for a different task profile. Understanding which line to reach for — and why — is the first decision any practitioner must make before deployment.
| Model | Example | Description |
|---|---|---|
model="deepseek-chat" (API) | 671B total / 37B active MoE general-purpose model; 128K context; pre-trained on 14.8T tokens; MIT license. | |
model="deepseek-reasoner" (API) | 671B total / 37B active reasoning model; chain-of-thought via <think> blocks; 128K context; MIT license. | |
toggle via chat template | Hybrid model combining V3 direct answers and R1 chain-of-thought in one 671B checkpoint; 128K context. | |
model="deepseek-ai/DeepSeek-R1-0528" | Updated R1 checkpoint (May 2025); AIME 2025 score 87.5% vs 70.0% original; uses ~23K tokens per reasoning trace. |