Context Engineering Cheat Sheet

Updated 2026-03-17

Next Topic: DALL-E and Midjourney Cheat Sheet

Context Engineering is the systematic discipline of designing, structuring, and managing the information fed to Large Language Models (LLMs) and AI agents to optimize their performance, accuracy, and efficiency. Unlike traditional prompt engineering, which focuses on crafting individual queries, context engineering operates at a systems level—managing the full data environment including memory, external knowledge, tool definitions, conversation history, and environmental signals. As models scale and context windows expand beyond millions of tokens, the challenge shifts from "what can we fit?" to "what should we include and how should we organize it?" Context engineering addresses this by applying principles of information architecture, relevance ranking, compression, and dynamic adaptation to ensure AI systems receive the right information at the right time without overwhelming their attention budget or incurring excessive costs.

What This Cheat Sheet Covers

This topic spans 12 focused tables and 98 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Context Engineering TechniquesTable 2: Context Window Management StrategiesTable 3: Retrieval & RAG Enhancement TechniquesTable 4: Context Structuring & FormattingTable 5: Memory & Session State ManagementTable 6: User Intent & Conversation ModelingTable 7: Context Compression & ReductionTable 8: Multimodal Context IntegrationTable 9: Context Relevance & Quality ControlTable 10: Cost & Performance OptimizationTable 11: Advanced Context Patterns for AI AgentsTable 12: Emerging & Specialized Techniques

Table 1: Core Context Engineering Techniques

Technique	Example	Description
Retrieval-Augmented Generation (RAG)	`query = "What is RAG?"` `docs = retriever.get_relevant(query)` `context = f"Context: {docs}\nQuery: {query}"`	• Dynamically retrieves relevant documents from external knowledge bases and injects them into the context window • reduces hallucination and enables up-to-date information without retraining.
Context Window Optimization	`max_tokens = 128000` `used_tokens = count_tokens(context)` `if used_tokens > max_tokens:` `context = truncate(context)`	Managing the finite token budget by selecting, prioritizing, and structuring information to fit within model limits while preserving critical content.
Semantic Chunking	`chunks = semantic_splitter.split(` `text, max_size=512, overlap=50` `)`	• Dividing documents into meaningful segments based on topic boundaries and semantic coherence rather than arbitrary character counts • improves retrieval precision.
Contextual Retrieval	`chunk_with_context = f"` `{doc_summary}\n{chunk}` `"`	• Prepends each chunk with context explaining what the document is about before embedding • reduces ambiguity and improves retrieval accuracy by 49%.
Reranking	`results = initial_retrieval(query)` `reranked = cross_encoder.rank(` `query, results` `)[:top_k]`	• Applies a second-stage neural model (e.g., cross-encoder) to reorder retrieved documents by true relevance to the query • significantly boosts precision.

Table 1: Core Context Engineering Techniques

Technique	Example	Description
Retrieval-Augmented Generation (RAG)	`query = "What is RAG?"` `docs = retriever.get_relevant(query)` `context = f"Context: {docs}\nQuery: {query}"`	• Dynamically retrieves relevant documents from external knowledge bases and injects them into the context window • reduces hallucination and enables up-to-date information without retraining.
Context Window Optimization	`max_tokens = 128000` `used_tokens = count_tokens(context)` `if used_tokens > max_tokens:` `context = truncate(context)`	Managing the finite token budget by selecting, prioritizing, and structuring information to fit within model limits while preserving critical content.
Semantic Chunking	`chunks = semantic_splitter.split(` `text, max_size=512, overlap=50` `)`	• Dividing documents into meaningful segments based on topic boundaries and semantic coherence rather than arbitrary character counts • improves retrieval precision.
Contextual Retrieval	`chunk_with_context = f"` `{doc_summary}\n{chunk}` `"`	• Prepends each chunk with context explaining what the document is about before embedding • reduces ambiguity and improves retrieval accuracy by 49%.
Reranking	`results = initial_retrieval(query)` `reranked = cross_encoder.rank(` `query, results` `)[:top_k]`	• Applies a second-stage neural model (e.g., cross-encoder) to reorder retrieved documents by true relevance to the query • significantly boosts precision.