Vector embeddings transform complex data—text, images, audio—into numerical representations in high-dimensional space where semantic similarity becomes geometric proximity. They power modern AI applications from semantic search to retrieval-augmented generation (RAG), enabling machines to understand meaning rather than just match keywords. The key insight: distance metrics in embedding space directly measure conceptual similarity, allowing algorithms to find related items without exact string matching. Modern advances including multi-vector late interaction, instruction-tuned embeddings, and Matryoshka Representation Learning have pushed retrieval accuracy and efficiency to new heights. Understanding embedding dimensions, distance metrics, indexing strategies, and the full retrieval pipeline is essential for building efficient, production-scale AI systems.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 113 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Embedding Concepts
| Concept | Example | Description |
|---|---|---|
[0.23, -0.45, 0.12, ...] 768 dims | • Fixed-size vector where most values are non-zero • captures semantic meaning in compact, continuous space • produced by neural models like BERT or OpenAI. | |
{42: 2.3, 103: 1.7, 891: 0.9} | • Vector where most values are zero • uses explicit term-based features • interpretable dimensions often correspond to vocabulary tokens • models like SPLADE or BM25. | |
| 76815363072 | • Number of elements in the vector • higher dimensions capture finer nuances but increase memory and compute cost • typical ranges: 384–1536 for text, 512–2048 for images. | |
cosine_sim("king", "queen") = 0.87 | • Degree to which two items share conceptual meaning rather than surface form • measured via distance metrics • embeddings map similarity to proximity. | |
\mathbb{R}^{1536} | • High-dimensional geometric space where each axis represents a learned feature • semantically similar items cluster together • structure emerges from training. |