Retrieval-Augmented Generation (RAG) grounds a language model in external documents instead of relying solely on its frozen weights, and "advanced RAG" is the engineering discipline of making every stage of that pipeline β indexing, retrieval, query handling, post-processing, and generation β measurably better than the naive embed-and-fetch-top-k baseline. It matters because most production failures are retrieval failures, not generation failures: the model answers well when given the right context and hallucinates when given the wrong one, so the highest-leverage work happens before the LLM ever sees a token. The key mental model to carry into the tables below: treat RAG as a multi-stage funnel where recall is won early (hybrid retrieval, smart chunking, query transformation) and precision is won late (reranking, compression, context ordering) β and that bigger context windows do not remove the need for sharp retrieval, they just hide the cost of imprecision until it silently degrades answers.
What This Cheat Sheet Covers
This topic spans 9 focused tables and 72 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Hybrid and Neural Retrieval Foundations
The retrieval layer decides the ceiling on answer quality; combining lexical exactness with semantic understanding, and adding neural ranking, is the foundation every other advanced pattern builds on.
| Technique | Example | Description |
|---|---|---|
vectorstore.similarity_search(query, k=5) | Embeds query and chunks into vectors, retrieves by cosine/dot-product similarity; captures semantic meaning but can miss exact keywords. | |
BM25Retriever.from_documents(docs) | Probabilistic term-frequency ranking; excels at exact keyword, code, and rare-term matches that dense embeddings blur. | |
hybrid(query, alpha=0.5) | Runs lexical and vector retrievers in parallel and merges results; consistently beats either alone on heterogeneous corpora and is the de facto production default. | |
\text{RRF}(d)=\sum_{r}\frac{1}{k+\text{rank}_r(d)} | Rank-only fusion of multiple result lists (kβ60); merges hybrid or multi-query results without needing comparable scores. | |
reranker.rank(query, docs)[:top_n] | Second stage that jointly encodes query+document for a precise relevance score; slow but high-precision, applied only to top candidates. |