Semantic search is a data retrieval technique that focuses on understanding the contextual meaning and intent behind user queries rather than relying solely on exact keyword matching. It operates within the broader fields of natural language processing (NLP), information retrieval, and AI-powered search systems, becoming foundational to modern applications like Retrieval-Augmented Generation (RAG), recommendation engines, and enterprise knowledge bases. Unlike traditional lexical search (BM25, TF-IDF) which matches literal terms, semantic search maps queries and documents into high-dimensional vector embeddings that capture semantic relationships—enabling it to find "laptop for programming" when a user searches "computer for coding." A critical insight: hybrid approaches combining semantic and lexical signals nearly always outperform either method alone in production systems, as they balance semantic understanding with exact term matching.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 102 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Retrieval Approaches
The fundamental ways to fetch relevant documents, and the trade-offs that define the whole field. Dense retrieval captures meaning, sparse nails exact terms, and hybrid combines both — while the bi-encoder versus cross-encoder distinction is the speed-versus-precision dial that explains why production systems retrieve fast then rerank carefully.
| Method | Example | Description |
|---|---|---|
query_emb = model.encode("laptop")results = search(query_emb, index) | • Uses neural embeddings to represent text as continuous vectors • captures semantic meaning rather than surface-level patterns. | |
scores = bm25.get_scores(query_tokens) | • Uses exact term matching with statistical weighting • generates high-dimensional sparse vectors where most values are zero. | |
results = alpha*dense + (1-alpha)*sparse | • Combines dense and sparse retrieval into a single ranked list • balances semantic understanding with keyword precision. | |
q_emb = encoder(query)d_emb = encoder(doc)sim = cosine(q_emb, d_emb) | • Encodes query and document independently then computes similarity • fast at scale but less accurate than cross-encoders. |