AI-LLM Hallucination Prevention Cheat Sheet

Updated 2026-04-05

Next Topic: AI-LLM Memory & Reasoning Cheat Sheet

Hallucinations in large language models are confident but factually incorrect, nonsensical, or ungrounded responses—a fundamental challenge that emerges from the probabilistic nature of token-by-token prediction in transformer architectures. Preventing hallucinations requires grounding outputs in verifiable sources, constraining generation behavior, and implementing multi-layered verification rather than relying solely on the model's training. Modern RAG pipelines have advanced substantially, with hybrid retrieval, reranking, and self-reflective retrieval strategies significantly improving factual grounding. The key insight: effective hallucination prevention is an orchestration problem, combining prompt design, retrieval mechanisms, sampling strategies, and post-generation validation into a coherent system where each layer compensates for the others' weaknesses.

What This Cheat Sheet Covers

This topic spans 12 focused tables and 85 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Grounding TechniquesTable 2: Prompt Engineering StrategiesTable 3: Sampling and Decoding ParametersTable 4: Post-Generation VerificationTable 5: Training and Fine-Tuning ApproachesTable 6: Architectural and Model-Level StrategiesTable 7: Confidence and Uncertainty EstimationTable 8: Human-in-the-Loop and OversightTable 9: Evaluation and Detection MetricsTable 10: Advanced Detection MethodsTable 11: Monitoring and Production StrategiesTable 12: Testing and Red-Teaming

Table 1: Grounding Techniques

Technique	Example	Description
Retrieval-Augmented Generation (RAG)	`query = "Tesla revenue 2025"` `docs = retriever.search(query)` `prompt = f"Based on: {docs}, answer: {query}"`	• Retrieves relevant documents from external knowledge base before generation, anchoring responses in retrieved facts rather than parametric knowledge alone • reduces hallucination rate by over 40% when properly implemented.
Hybrid retrieval (BM25 + dense)	`results_kw = bm25.search(query, top_k=20)` `results_vec = vector_db.search(embed(query), top_k=20)` `fused = reciprocal_rank_fusion(results_kw, results_vec)`	• Combines sparse keyword (BM25) and dense semantic search via Reciprocal Rank Fusion • keyword search catches exact terms and IDs that embeddings miss, semantic search catches paraphrases keyword search misses.
Retrieval reranking	`candidates = retrieve_top_k(query, k=50)` `reranked = cross_encoder.score(query, candidates)` `context = reranked[:5]`	• Uses a cross-encoder model to re-score retrieved candidates against the query, selecting highest-precision passages • narrows a large recall pool to the most relevant context before generation.
Self-RAG (Self-Reflective RAG)	`# Model decides when to retrieve` `output = self_rag_model.generate(prompt)` `# Reflection tokens: [Retrieve], [IsREL], [IsSUP]`	• Fine-tuned LLM decides on-demand whether to retrieve documents and critiques its own output via learned reflection tokens • outperforms static RAG by skipping retrieval when unnecessary and self-verifying generated claims.
Corrective RAG (CRAG)	`score = retrieval_evaluator(query, docs)` `if score < threshold:` `docs = web_search(query)` `answer = llm(query, docs)`	• Lightweight retrieval evaluator scores retrieved documents • if confidence is low it falls back to web search and applies a decompose-then-recompose algorithm to strip noise before passing context to the LLM.
Grounding with citations	`response = model.generate(prompt)` `citations = extract_sources(response)` `return response + citations`	• Requires model to explicitly cite sources for factual claims, making verification straightforward • enables users to trace statements back to original documents and identify unsupported assertions.

Table 1: Grounding Techniques

Technique	Example	Description
Retrieval-Augmented Generation (RAG)	`query = "Tesla revenue 2025"` `docs = retriever.search(query)` `prompt = f"Based on: {docs}, answer: {query}"`	• Retrieves relevant documents from external knowledge base before generation, anchoring responses in retrieved facts rather than parametric knowledge alone • reduces hallucination rate by over 40% when properly implemented.
Hybrid retrieval (BM25 + dense)	`results_kw = bm25.search(query, top_k=20)` `results_vec = vector_db.search(embed(query), top_k=20)` `fused = reciprocal_rank_fusion(results_kw, results_vec)`	• Combines sparse keyword (BM25) and dense semantic search via Reciprocal Rank Fusion • keyword search catches exact terms and IDs that embeddings miss, semantic search catches paraphrases keyword search misses.
Retrieval reranking	`candidates = retrieve_top_k(query, k=50)` `reranked = cross_encoder.score(query, candidates)` `context = reranked[:5]`	• Uses a cross-encoder model to re-score retrieved candidates against the query, selecting highest-precision passages • narrows a large recall pool to the most relevant context before generation.
Self-RAG (Self-Reflective RAG)	`# Model decides when to retrieve` `output = self_rag_model.generate(prompt)` `# Reflection tokens: [Retrieve], [IsREL], [IsSUP]`	• Fine-tuned LLM decides on-demand whether to retrieve documents and critiques its own output via learned reflection tokens • outperforms static RAG by skipping retrieval when unnecessary and self-verifying generated claims.
Corrective RAG (CRAG)	`score = retrieval_evaluator(query, docs)` `if score < threshold:` `docs = web_search(query)` `answer = llm(query, docs)`	• Lightweight retrieval evaluator scores retrieved documents • if confidence is low it falls back to web search and applies a decompose-then-recompose algorithm to strip noise before passing context to the LLM.
Grounding with citations	`response = model.generate(prompt)` `citations = extract_sources(response)` `return response + citations`	• Requires model to explicitly cite sources for factual claims, making verification straightforward • enables users to trace statements back to original documents and identify unsupported assertions.

AI/LLM Hallucination Prevention Cheat Sheet

Table 1: Grounding Techniques

AI/LLM Hallucination Prevention Cheat Sheet

Table 1: Grounding Techniques