Semantic Search Cheat Sheet

Updated 2026-04-27

Next Topic: Speculative Decoding and LLM Serving Optimization Cheat Sheet

Semantic search is a data retrieval technique that focuses on understanding the contextual meaning and intent behind user queries rather than relying solely on exact keyword matching. It operates within the broader fields of natural language processing (NLP), information retrieval, and AI-powered search systems, becoming foundational to modern applications like Retrieval-Augmented Generation (RAG), recommendation engines, and enterprise knowledge bases. Unlike traditional lexical search (BM25, TF-IDF) which matches literal terms, semantic search maps queries and documents into high-dimensional vector embeddings that capture semantic relationships—enabling it to find "laptop for programming" when a user searches "computer for coding." A critical insight: hybrid approaches combining semantic and lexical signals nearly always outperform either method alone in production systems, as they balance semantic understanding with exact term matching.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 102 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Retrieval ApproachesTable 2: Embedding ModelsTable 3: Distance MetricsTable 4: Indexing TechniquesTable 5: Lexical Search AlgorithmsTable 6: Hybrid Search FusionTable 7: Query Optimization TechniquesTable 8: Reranking StrategiesTable 9: Chunking StrategiesTable 10: Vector DatabasesTable 11: Evaluation MetricsTable 12: Retrieval-Augmented Generation (RAG)Table 13: Advanced TechniquesTable 14: Implementation PatternsTable 15: Common Pitfalls

Table 1: Core Retrieval Approaches

Method	Example	Description
Dense retrieval	`query_emb = model.encode("laptop")` `results = search(query_emb, index)`	• Uses neural embeddings to represent text as continuous vectors • captures semantic meaning rather than surface-level patterns.
Sparse retrieval	`scores = bm25.get_scores(query_tokens)`	• Uses exact term matching with statistical weighting • generates high-dimensional sparse vectors where most values are zero.
Hybrid search	`results = alphadense + (1-alpha)sparse`	• Combines dense and sparse retrieval into a single ranked list • balances semantic understanding with keyword precision.
Bi-encoder retrieval	`q_emb = encoder(query)` `d_emb = encoder(doc)` `sim = cosine(q_emb, d_emb)`	• Encodes query and document independently then computes similarity • fast at scale but less accurate than cross-encoders.

Table 1: Core Retrieval Approaches

Method	Example	Description
Dense retrieval	`query_emb = model.encode("laptop")` `results = search(query_emb, index)`	• Uses neural embeddings to represent text as continuous vectors • captures semantic meaning rather than surface-level patterns.
Sparse retrieval	`scores = bm25.get_scores(query_tokens)`	• Uses exact term matching with statistical weighting • generates high-dimensional sparse vectors where most values are zero.
Hybrid search	`results = alphadense + (1-alpha)sparse`	• Combines dense and sparse retrieval into a single ranked list • balances semantic understanding with keyword precision.
Bi-encoder retrieval	`q_emb = encoder(query)` `d_emb = encoder(doc)` `sim = cosine(q_emb, d_emb)`	• Encodes query and document independently then computes similarity • fast at scale but less accurate than cross-encoders.