Vector Embeddings Cheat Sheet

Updated 2026-05-28

Next Topic: Vision-Language Models (VLMs) Cheat Sheet

Vector embeddings transform complex data—text, images, audio, video—into numerical representations in high-dimensional space where semantic similarity becomes geometric proximity. They power modern AI applications from semantic search to retrieval-augmented generation (RAG), enabling machines to understand meaning rather than just match keywords. The key insight: distance metrics in embedding space directly measure conceptual similarity, allowing algorithms to find related items without exact string matching. In 2025–2026, the embedding landscape has been reshaped by all-modality models like Gemini Embedding 2 and Voyage 4, mixture-of-experts architectures that cut active-parameter costs by 40%, and asymmetric retrieval patterns that embed documents and queries with different-sized models sharing a compatible vector space. Understanding dimensions, distance metrics, indexing strategies, quantization tradeoffs, and the full retrieval pipeline is essential for building efficient, production-scale AI systems.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 142 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Embedding ConceptsTable 2: Embedding Models and ArchitecturesTable 3: Distance and Similarity MetricsTable 4: Vector Databases and StoresTable 5: Approximate Nearest Neighbor (ANN) AlgorithmsTable 6: Embedding Training and Fine-Tuning TechniquesTable 7: Retrieval Strategies and ApplicationsTable 8: Embedding Optimization TechniquesTable 9: Pooling StrategiesTable 10: Chunking Strategies for RAGTable 11: Sparse Retrieval MethodsTable 12: Indexing and Storage ConsiderationsTable 13: Embedding Evaluation MetricsTable 14: Advanced Embedding Techniques

Table 1: Core Embedding Concepts

Embeddings represent data as high-dimensional vectors where geometric distance encodes meaning; the concepts here define the vocabulary shared across every retrieval and generative AI system you will encounter. Getting these fundamentals right — especially understanding dense vs sparse and contextualized vs static — prevents the most common architectural mistakes.

Concept	Example	Description
Dense embedding	`[0.23, -0.71, 0.44, ...]` (768 floats)	• A continuous, low-dimensional vector where most values are non-zero • captures semantic meaning but loses exact-term recall
Sparse embedding	`{42: 1.3, 1891: 0.7}` (vocab-sized)	• A high-dimensional, mostly-zero vector • each non-zero dimension corresponds to a specific token • enables exact-term retrieval
Sentence embedding	`model.encode("hello world")`	• A single fixed-length vector representing an entire sentence or passage • the standard unit for semantic search and RAG retrieval
Contextualized embedding	`"bank"` in "river bank" ≠ "bank" in "savings bank"	• Vectors computed from full sentence context, so the same word gets a different vector depending on surrounding text (e.g • BERT, GPT).
Static embedding	`word2vec["king"] - word2vec["man"] + word2vec["woman"]`	• A fixed per-word vector regardless of context • fast and memory-efficient but cannot represent polysemy
Semantic similarity	`cos_sim(embed("cat"), embed("feline"))` → 0.91	• Measures conceptual relatedness between two vectors • cosine similarity is the most common metric

Table 1: Core Embedding Concepts

Concept	Example	Description
Dense embedding	`[0.23, -0.71, 0.44, ...]` (768 floats)	• A continuous, low-dimensional vector where most values are non-zero • captures semantic meaning but loses exact-term recall
Sparse embedding	`{42: 1.3, 1891: 0.7}` (vocab-sized)	• A high-dimensional, mostly-zero vector • each non-zero dimension corresponds to a specific token • enables exact-term retrieval
Sentence embedding	`model.encode("hello world")`	• A single fixed-length vector representing an entire sentence or passage • the standard unit for semantic search and RAG retrieval
Contextualized embedding	`"bank"` in "river bank" ≠ "bank" in "savings bank"	• Vectors computed from full sentence context, so the same word gets a different vector depending on surrounding text (e.g • BERT, GPT).
Static embedding	`word2vec["king"] - word2vec["man"] + word2vec["woman"]`	• A fixed per-word vector regardless of context • fast and memory-efficient but cannot represent polysemy
Semantic similarity	`cos_sim(embed("cat"), embed("feline"))` → 0.91	• Measures conceptual relatedness between two vectors • cosine similarity is the most common metric