AI memory and reasoning systems enable large language models and agents to retain information across interactions and solve complex problems through structured thought processes. Memory systems range from short-term conversation buffers to persistent knowledge graphs, while reasoning techniques guide models through step-by-step problem decomposition, verification, and refinement. In 2026, context engineering has emerged as the overarching discipline for managing what information lives in the context window at each step, and memory scaling — the property that agent performance improves as accumulated experience grows — has emerged as a new axis alongside parametric and inference-time scaling. Understanding the interplay between memory architecture, reasoning strategy, multi-agent coordination, and efficient context management is critical for building production agents that maintain context, reduce hallucinations, and execute multi-step workflows reliably.
What This Cheat Sheet Covers
This topic spans 17 focused tables and 134 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Memory Types and Architecture
Memory in LLM agents draws directly from cognitive science: separating episodic, semantic, and procedural stores enables selective retrieval and prevents context overload. Choosing the right type — or the right combination — determines both how much an agent remembers and how cheaply it retrieves the right fact at query time.
| Type | Example | Description |
|---|---|---|
messages = [{"role": "user", "content": "Hi"}, {"role": "assistant", "content": "Hello"}] | • Stores recent conversation turns within the current session • limited by context window size and discarded when the session ends. | |
vector_db.store(embedding, metadata, user_id) | • Persists knowledge across sessions using external storage • retrieved dynamically based on relevance rather than loaded wholesale. | |
{"timestamp": "2026-04-01", "event": "user asked about X", "outcome": "provided Y"} | • Records specific events and interactions with temporal context • allows the agent to recall what happened when and learn from past experiences. | |
knowledge_graph.add_fact("Paris", "capital_of", "France") | • Stores general facts and relationships independent of when they were learned • declarative knowledge without time markers. | |
workflow = ["analyze_input", "generate_plan", "execute_steps", "verify_output"] | • Encodes learned skills, workflows, and action sequences • guides how to perform tasks rather than storing facts, often implemented as tool patterns or code. | |
obs = synthesize(raw_facts, level="insight")store.add(obs, type="observation") | • Higher-order synthesis of raw episodic and semantic memories into generalizable insights • improves multi-hop recall by giving retrieval access to richer, more abstract representations. | |
results = vector_db.search(query_embedding, top_k=5) | • Embeds memories as vectors • enables semantic similarity search to retrieve contextually relevant information from large memory stores. | |
graph.add_edge("user_123", "prefers", "dark_mode") | • Represents knowledge as entities and relationships in a graph structure • captures complex associations and multi-hop reasoning paths. |