Vector embeddings transform complex data—text, images, audio, video—into numerical representations in high-dimensional space where semantic similarity becomes geometric proximity. They power modern AI applications from semantic search to retrieval-augmented generation (RAG), enabling machines to understand meaning rather than just match keywords. The key insight: distance metrics in embedding space directly measure conceptual similarity, allowing algorithms to find related items without exact string matching. In 2025–2026, the embedding landscape has been reshaped by all-modality models like Gemini Embedding 2 and Voyage 4, mixture-of-experts architectures that cut active-parameter costs by 40%, and asymmetric retrieval patterns that embed documents and queries with different-sized models sharing a compatible vector space. Understanding dimensions, distance metrics, indexing strategies, quantization tradeoffs, and the full retrieval pipeline is essential for building efficient, production-scale AI systems.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 142 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Embedding Concepts
Embeddings represent data as high-dimensional vectors where geometric distance encodes meaning; the concepts here define the vocabulary shared across every retrieval and generative AI system you will encounter. Getting these fundamentals right — especially understanding dense vs sparse and contextualized vs static — prevents the most common architectural mistakes.
| Concept | Example | Description |
|---|---|---|
[0.23, -0.71, 0.44, ...] (768 floats) | • A continuous, low-dimensional vector where most values are non-zero • captures semantic meaning but loses exact-term recall | |
{42: 1.3, 1891: 0.7} (vocab-sized) | • A high-dimensional, mostly-zero vector • each non-zero dimension corresponds to a specific token • enables exact-term retrieval | |
model.encode("hello world") | • A single fixed-length vector representing an entire sentence or passage • the standard unit for semantic search and RAG retrieval | |
"bank" in "river bank" ≠ "bank" in "savings bank" | • Vectors computed from full sentence context, so the same word gets a different vector depending on surrounding text (e.g • BERT, GPT). | |
word2vec["king"] - word2vec["man"] + word2vec["woman"] | • A fixed per-word vector regardless of context • fast and memory-efficient but cannot represent polysemy | |
cos_sim(embed("cat"), embed("feline")) → 0.91 | • Measures conceptual relatedness between two vectors • cosine similarity is the most common metric |