Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Small Language Models (SLMs) Cheat Sheet

Small Language Models (SLMs) Cheat Sheet

Back to AI and Machine Learning
Updated 2026-05-18
Next Topic: spaCy Industrial NLP Library Cheat Sheet

Small Language Models (SLMs) are compact AI models with 1B-13B parameters designed for efficient deployment on edge devices and resource-constrained environments. Unlike their larger counterparts that require cloud infrastructure, SLMs enable on-device inference with faster response times, lower latency, and enhanced privacy — making them ideal for mobile, IoT, and offline applications. The critical insight: SLMs trade broad general knowledge for domain-specific expertise and efficiency, achieving 70-90% of LLM performance while using a fraction of resources through techniques like quantization, distillation, and pruning.

What This Cheat Sheet Covers

This topic spans 12 focused tables and 89 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core SLM CharacteristicsTable 2: Major SLM Model FamiliesTable 3: Model Compression TechniquesTable 4: Quantization MethodsTable 5: Knowledge Distillation ApproachesTable 6: Parameter-Efficient Fine-Tuning (PEFT)Table 7: On-Device Deployment FrameworksTable 8: Hardware and Memory RequirementsTable 9: Inference Optimization TechniquesTable 10: Evaluation and BenchmarkingTable 11: Domain Specialization StrategiesTable 12: SLM vs LLM Decision Framework

Table 1: Core SLM Characteristics

Understanding what defines a small language model and how size correlates with deployment constraints helps determine when SLMs are the right choice over large models.

CharacteristicExampleDescription
Parameter Count
1B-13B parameters
Typically ranges from 100 million to 13 billion parameters; models above 13B are generally classified as LLMs
• smaller parameter counts enable faster inference and lower memory footprint
Model Size
FP16: ~2GB (1B) to 26GB (13B)
INT4: ~0.5GB (1B) to 6.5GB (13B)
Size in GB depends on precision; FP16 requires ~2 bytes per parameter, INT4 ~0.5 bytes
• critical for determining whether a model fits in device memory
Training Data Volume
500B-9T tokens
SLMs like Phi-4 (14B) trained on 9 trillion tokens
• smaller models compensate for size through high-quality, curated datasets and longer training
Inference Latency
<100ms per token on edge devices
SLMs achieve sub-100ms latency on mobile CPUs/GPUs
• 2-5x faster than streaming from cloud LLMs due to elimination of network overhead

More in AI and Machine Learning

  • Semi-Supervised Learning Cheat Sheet
  • spaCy Industrial NLP Library Cheat Sheet
  • AI Bias & Fairness Cheat Sheet
  • Edge AI and TinyML Cheat Sheet
  • Machine Learning System Design Cheat Sheet
  • On-Device LLM Inference Cheat Sheet
View all 65 topics in AI and Machine Learning