Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

DeepSeek and Qwen Models Cheat Sheet

DeepSeek and Qwen Models Cheat Sheet

Back to Generative AI
Updated 2026-05-21
Next Topic: Diffusion Models Cheat Sheet

DeepSeek (by DeepSeek-AI) and Qwen (by Alibaba Cloud) are the two most prominent open-weight large language model families from Chinese AI labs, collectively defining the frontier of non-proprietary AI in 2025–2026. Both families employ Mixture-of-Experts (MoE) architectures that activate only a fraction of total parameters per token, enabling frontier-level performance at dramatically reduced inference cost. The key insight for practitioners: these models are not monolithic — each family spans general-purpose, reasoning-specialized, code-specialized, and multimodal variants, all with different prompt formatting, context windows, and licensing terms, so choosing correctly requires understanding the full taxonomy.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 103 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: DeepSeek Model Family OverviewTable 2: Qwen Model Family OverviewTable 3: DeepSeek Core Architecture — MoE and AttentionTable 4: Qwen Core Architecture — MoE and ReasoningTable 5: DeepSeek-R1 Distilled VariantsTable 6: Qwen2.5 and Qwen3 Dense Model SizesTable 7: Prompt Formatting — ChatML and Special TokensTable 8: DeepSeek and Qwen API AccessTable 9: Self-Hosting with vLLMTable 10: DeepSeek-Coder-V2 and Qwen Code ModelsTable 11: Qwen Vision-Language (VL) ModelsTable 12: Licensing and Open-Weight StatusTable 13: Recommended Inference ParametersTable 14: Key Benchmarks and Performance Reference

Table 1: DeepSeek Model Family Overview

The DeepSeek family spans five distinct model lines, each optimized for a different task profile. Understanding which line to reach for — and why — is the first decision any practitioner must make before deployment.

ModelExampleDescription
DeepSeek-V3
model="deepseek-chat" (API)
671B total / 37B active MoE general-purpose model; 128K context; pre-trained on 14.8T tokens; MIT license.
DeepSeek-R1
model="deepseek-reasoner" (API)
671B total / 37B active reasoning model; chain-of-thought via <think> blocks; 128K context; MIT license.
DeepSeek-V3.1
toggle via chat template
Hybrid model combining V3 direct answers and R1 chain-of-thought in one 671B checkpoint; 128K context.
DeepSeek-R1-0528
model="deepseek-ai/DeepSeek-R1-0528"
Updated R1 checkpoint (May 2025); AIME 2025 score 87.5% vs 70.0% original; uses ~23K tokens per reasoning trace.

More in Generative AI

  • DALL-E and Midjourney Cheat Sheet
  • Diffusion Models Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • ColBERT and Late Interaction Retrieval Cheat Sheet
  • LlamaIndex Cheat Sheet
  • pgvector for Postgres Vector Search Cheat Sheet
View all 95 topics in Generative AI