AI agents are autonomous systems built on large language models that can perceive their environment, reason through complex tasks, and take actions using external tools to achieve goals. Unlike traditional chatbots that simply respond to queries, agents operate through continuous think-act-observe loops, dynamically planning their next steps based on outcomes. The defining characteristic is tool use—agents don't just generate text; they execute functions, query databases, call APIs, and coordinate with other agents through standardized protocols like MCP, A2A, and AG-UI. This shift from prediction to execution makes agents the foundation of agentic AI, transforming LLMs from assistants into operational systems capable of multi-step workflows, self-correction, and long-horizon task completion. Understanding agent architecture—perception, reasoning engines, memory systems, and orchestration patterns—is essential for building reliable production agents in 2026. Frameworks like LangGraph, CrewAI, OpenAI Agents SDK, Google ADK, and PydanticAI provide production-ready primitives, while evaluation tools like DeepEval and LLM-as-Judge enable systematic quality measurement.
19 tables, 135 concepts. Select a concept node to jump to its table row.
Table 1: Core Agent Concepts
| Concept | Example | Description |
|---|---|---|
while not done: thought = think(observation) action = decide(thought) observation = execute(action) | • Continuous think-act-observe cycle where agent reasons about current state, takes action, observes result, then repeats • fundamental execution pattern for all agents. | |
tools = [calculator, web_search, db_query]agent.invoke("What's 2^16?", tools) | • Agent invokes external functions or APIs to perform actions beyond text generation • requires structured tool definitions with JSON schemas describing parameters and expected outputs. | |
{"name": "get_weather", "args": {"city": "NYC"}} | • LLM generates structured function invocation by selecting appropriate tool and filling parameters • model returns JSON that agent runtime executes, not the result itself. | |
Goal → Plan → Execute → Reflect → Retry | • Goal-driven execution pattern where agent decomposes tasks, takes actions, evaluates outcomes, and adjusts strategy • contrasts with linear prompt chains. | |
raw_input = {"text": query, "context": session}structured_data = parse(raw_input) | • Transforms raw inputs (text, API calls, sensor data) into structured representations the reasoning engine processes • includes parsing, normalization, and context extraction. | |
LLM + planning algorithm + memory | • Core decision-making component that analyzes inputs, selects actions, and generates plans • typically an LLM enhanced with prompting strategies like ReAct or chain-of-thought. | |
["send_email", "query_db", "http_get", "file_write"] | • Set of available tools and operations agent can execute • defines boundaries of what agent can accomplish in its environment. | |
Agent decides when to stop vs. asks user for next step | • Degree to which agent makes decisions without human intervention • ranges from fully autonomous to human-in-the-loop requiring approval per action. |
Table 2: Architecture Patterns
| Pattern | Example | Description |
|---|---|---|
Thought: Need price dataAction: query_db("SELECT price")Observation: $42Thought: Now calculate... | • Reasoning and Acting interleaved • agent generates Thought explaining reasoning, executes Action, receives Observation of result, then generates next Thought • think-act-observe pattern improves reliability and debuggability. | |
output = generate()critique = evaluate(output, criteria)if not good: revise(critique) | • Agent self-evaluates outputs against quality criteria, then iterates to improve • generates, critiques, and refines until meeting standards or max attempts. | |
output = generator.propose(task) feedback = evaluator.critique(output, criteria) if not done: generator.revise(feedback) | • Generator produces output, evaluator critiques against criteria, loop repeats until quality threshold met • enables autonomous quality assurance via two cooperating agents instead of one. | |
Goal → Subtasks → Order → Execute | • Agent decomposes complex goals into actionable subtasks, determines execution order, then coordinates completion • enables long-horizon task solving. | |
Let's think step-by-step:1. Parse question2. Identify data needed3. Calculate result | • Prompts agent to show intermediate reasoning steps • improves accuracy on complex tasks by making thought process explicit before answering. | |
if query_type == "sql": route_to(sql_agent)else: route_to(general_agent) | • Conditional dispatcher that directs requests to specialized agents or tools based on content analysis • enables modular, expert-based architectures. | |
supervisor.assign(task, worker_agents)results = await all_workers()supervisor.synthesize(results) | • Hierarchical delegation where supervisor breaks work into parallel subtasks, assigns to worker agents, aggregates results • scales complex workflows efficiently. | |
handoff(source=triage_agent, target=billing_agent, condition="billing question") | • One agent transfers control to another specialized agent mid-conversation • enables seamless delegation where the receiving agent continues with full context. | |
plan = planner.create(goal)for step in plan: execute(step) | • Separates planning from execution • planner creates full task breakdown upfront, then executor runs each step • clearer than pure ReAct for multi-step workflows. | |
Explore multiple reasoning paths in tree structure, backtrack if needed | • Agent explores multiple solution paths simultaneously, evaluating each branch before committing • useful when single linear reasoning path may fail. |
Table 3: Memory Systems
| Type | Example | Description |
|---|---|---|
Current conversation context in prompt | • Working context for active task • contents of current LLM context window including system prompt, recent messages, and intermediate outputs. | |
Vector DB storing past interactions | • Persistent storage across sessions • agent retrieves relevant historical context to inform current decisions • critical for maintaining user preferences and learning. | |
graphiti.add_episode(messages) results = graphiti.search("user prefs", center_node_uuid) | • Temporal knowledge graph memory storing facts with validity windows • outperforms vector-only approaches on cross-session temporal reasoning (63.8% vs 49% on LongMemEval). | |
"On 2026-01-15, user preferred JSON output" | • Stores specific past events and experiences with temporal context • enables agent to recall "what happened when" for situational awareness. | |
"User always wants reports in PDF format" | • Stores facts, preferences, and general knowledge learned over time • extracted and consolidated from episodic traces into persistent rules. | |
Learned workflows like "how to file a bug report" | • Encodes how to perform tasks • captures successful action sequences as reusable procedures the agent can invoke. | |
agent = create_agent(model="openai/gpt-4o") agent.send_message("Prefer JSON output") | • OS-inspired tiered memory runtime (core memory, recall storage, archival) where agents self-edit their own memory blocks • supports background learning and persistent, cross-session knowledge. | |
Variables tracking current task state | • Active state during execution • holds intermediate results, loop counters, and task-specific variables distinct from conversation history. | |
redis.set("team_context", state)other_agent.get("team_context") | • Cross-agent state enabling coordination • multiple agents read/write shared context to maintain consistency in multi-agent systems. |
Table 4: Multi-Agent Systems
| Pattern | Example | Description |
|---|---|---|
manager → specialists → workers | • Supervisor delegates to specialist agents who may further delegate to workers • tasks flow downward, results aggregate upward • mimics organizational structure. | |
Agents discuss and debate to reach consensus | • Peer agents interact to solve problems jointly • no fixed hierarchy, agents contribute expertise and negotiate solutions. | |
board.write("analysis", result) if board.has_new("data"): agent.process(board.read("data")) | • Agents read/write to a shared state space without direct messaging • agents activate when relevant data appears, enabling loose coupling and emergent coordination without a central orchestrator. | |
Output of Agent A → Input to Agent B → Output to Agent C | • Linear handoff where each agent processes and passes work to next • common for assembly-line workflows with distinct stages. | |
Multiple agents execute tasks simultaneously, results merged | • Agents work on independent subtasks concurrently • coordinator aggregates outputs • reduces total latency for decomposable work. | |
Agents communicate peer-to-peer without fixed structure | • Decentralized mesh where any agent can message any other • emergent coordination without central orchestrator • complex but flexible. |
Table 5: Communication Protocols
| Protocol | Example | Description |
|---|---|---|
mcp_server.list_tools()mcp_server.call_tool("get_data", args) | • Standardized interface for connecting LLMs to external data sources and tools • enables discoverable, consistent tool integration across platforms • originated by Anthropic. | |
agent_a.send(agent_b, message)response = agent_b.process_and_reply() | • Inter-agent messaging protocol originated by Google for secure coordination • agents exchange structured messages to negotiate, delegate, and synchronize work • now under the Linux Foundation, having absorbed IBM's ACP (Agent Communication Protocol) into a unified open standard. | |
agent.emit(TextMessageStart(id))agent.emit(ToolCallStart(id, name)) | • Open event-based protocol standardizing real-time communication between AI agents and user interfaces • enables streaming of text, tool calls, state updates, and human-in-the-loop interactions • supported by AWS Bedrock, LangGraph, and CrewAI. | |
agent.subscribe("topic/events")publish("topic/events", data) | • Event-driven broadcast where agents subscribe to topics • publishers emit events to all subscribers without knowing recipients • decouples senders and receivers. | |
response = await agent.call(request) | • Synchronous query-reply pattern • calling agent blocks until receiving response • simplest communication model but creates tight coupling. | |
queue.push(task)worker = queue.pop()worker.execute(task) | • Agents communicate via asynchronous task queue • decouples producers from consumers, enables buffering and retry • common for background work. | |
def my_tool(input: str) -> str: ... | • Code-Oriented Resource Definition protocol for agent-tool communication • uses Python decorators to define tools, keeping definitions close to implementation • alternative to MCP for Python-native agent ecosystems. |
Table 6: Agent Frameworks
| Framework | Example | Description |
|---|---|---|
StateGraph with nodes, edges, checkpointing | • Graph-based orchestration for stateful, cyclical workflows • models agents as state machines with conditional routing and built-in persistence • production-grade. | |
Chains, agents, tools, memory modules | • Flexible toolkit for building LLM applications • provides abstractions for prompts, tools, memory • code-first with extensive integrations. | |
Crew of agents with roles, goals, backstories | • Role-based collaboration where agents simulate team dynamics • supports sequential and hierarchical processes • fast prototyping for multi-agent workflows. | |
Agent(name="Assistant", tools=[...])Runner.run_sync(agent, query) | • Official OpenAI framework replacing Swarm • core primitives: Agents, Handoffs, and Guardrails • Python-first with built-in tracing and MCP support. | |
ConversableAgent with conversation loops | • Conversational agents that communicate via message passing • emphasizes agent-to-agent dialogue for task solving • Microsoft-backed. | |
async for msg in query( prompt="Fix the bug", options=ClaudeAgentOptions()) | • Anthropic's official SDK giving programmatic access to Claude Code's capabilities • built-in tools for file reading, command execution, and web search • available in Python and TypeScript. | |
pip install google-adkSequential, Parallel, Loop workflow agents | • Google's modular framework optimized for Gemini ecosystem • supports workflow agents and LLM-driven dynamic routing • model-agnostic with multi-language support (Python, TypeScript, Go, Java). | |
agent = Agent('openai:gpt-5.2', output_type=MyModel) | • Type-safe Python framework by the Pydantic team • structured output with automatic validation, MCP/A2A/AG-UI integration • built-in evals and Logfire observability. | |
Plugins, planners, skills for enterprise integration | • Microsoft framework optimizing enterprise scenarios • tight Azure integration, supports C#/Python • emphasizes plugins as reusable skills. | |
agent = Agent(tools=[calculator])agent("What is sqrt of 1764") | • AWS open-source SDK with model-driven approach • model-agnostic supporting Bedrock, Anthropic, OpenAI, Gemini, Ollama • native MCP support and tool hot-reloading. | |
agent = CodeAgent(tools=[tool], model=model)agent.run("What is the weather?") | • HuggingFace's lightweight agent library emphasizing code-first tool execution • agents write and execute Python code as actions rather than JSON tool calls • supports local and remote models. | |
claw --model claude run "Refactor auth module" | • Open-source AI coding agent harness (Python+Rust, April 2026) • plugin-based with 19+ built-in tools (bash, git, LSP), LLM-agnostic; reached 100K+ GitHub stars in its first week. | |
agent = Agent(model=OpenAIChat(), tools=[...])agent.print_response("Summarize this") | • Fast, lightweight framework for building multi-modal agents • supports memory, knowledge, reasoning, and teams • designed for minimal footprint and high performance. |
Table 7: Tool Integration
| Technique | Example | Description |
|---|---|---|
{"name": "search", "description": "...", "parameters": {...}} | • JSON definition describing tool name, purpose, and expected parameters • LLM uses schema to understand when and how to invoke tool. | |
available_tools = mcp_server.list_tools() | • Agent queries available tools at runtime rather than having them hardcoded • enables dynamic tool ecosystems via MCP registries. | |
result = tools.execute(function_name, args) | • Runtime invokes function the LLM selected, passing generated arguments • returns result to agent for next reasoning step. | |
response_format={"type": "json_schema", "schema": {...}} | • LLM generates schema-constrained output (JSON, XML) using constrained decoding • ensures reliable parsing for tool arguments and agent-to-agent communication • supported by OpenAI, Anthropic, Google, and AWS Bedrock. | |
observation = parse_tool_output(result)add_to_context(observation) | • Converts tool output into format agent can reason about • may include formatting, error extraction, or summarization before feeding back to LLM. | |
calls = [get_weather("NYC"), get_weather("LA")]results = await asyncio.gather(*calls) | • Agent invokes multiple tools simultaneously when tasks are independent • reduces total latency by parallelizing I/O-bound operations. | |
Output of tool A becomes input to tool B | • Sequential tool composition where result of one function feeds into next • enables complex workflows from simple building blocks. |
Table 8: State Management
| Concept | Example | Description |
|---|---|---|
graph.compile(checkpointer=DynamoDB())state = graph.get_state(thread_id) | • Saving agent state at each step • enables pause/resume, time-travel debugging, and recovery from failures • critical for long-running agents. | |
redis.set(f"session:{id}", state) | • Durable storage of agent state across restarts • maintains conversation context, memory, and progress when processes terminate. | |
thread_id = uuid4()invoke(input, {"thread_id": thread_id}) | • Isolating parallel agent sessions • each thread has independent state to prevent crosstalk in multi-user or concurrent scenarios. | |
class AgentState(TypedDict): messages: List[Message] step_count: int | • Typed definition of agent state structure • ensures consistency and enables validation • critical for complex stateful workflows. | |
class AgentWorkflow: async def run(self): ... | • Agent workflows that survive crashes and restarts via platforms like Temporal • deterministic replay recovers progress automatically • critical for long-running production agents. | |
restore_checkpoint(previous_checkpoint_id) | • Reverting to earlier state after errors or for experimentation • allows "undo" in agent execution for debugging or optimization. |
Table 9: Execution Patterns
| Pattern | Example | Description |
|---|---|---|
result = agent.run(query)print(result) | • Blocking call that waits for agent completion before returning • simpler to reason about but locks caller until done. | |
task = asyncio.create_task(agent.run(query))result = await task | • Non-blocking invocation allowing concurrent operations • agent runs independently, caller continues work and retrieves result later. | |
async for token in agent.stream(): print(token, end="") | • Agent emits partial outputs in real-time rather than waiting for completion • improves UX by showing progress incrementally. | |
agent.on("tool_call", log_callback)agent.on("error", retry_callback) | • Callback-based execution where agent emits events (tool calls, errors, completions) triggering registered handlers • enables observability and custom logic. | |
results = agent.batch([query1, query2, ...])for r in results: process(r) | • Process multiple inputs in single invocation • amortizes overhead and enables optimization like prompt caching across batch. |
Table 10: Reasoning Techniques
| Technique | Example | Description |
|---|---|---|
"Translate this to French: Hello" | • Agent solves task without examples • relies solely on instruction and pre-training • fastest but less accurate for complex or ambiguous tasks. | |
Examples:Q: 2+2 A: 4Q: 3+5 A: 8Now: 7+9 = ? | • Providing example input-output pairs before actual query • teaches agent task format and desired output style through demonstration. | |
Generate 5 solutions, return majority answer | • Agent produces multiple reasoning paths for same question, then selects most common answer • improves robustness on complex reasoning. | |
Nodes = ideas, edges = relationships; explore graph | • Generalizes Tree-of-Thought to arbitrary graph structures • agent explores non-linear reasoning paths with cycles and cross-connections. | |
Iterative decision-making with environment feedback | • Agent thinks, acts, observes outcomes, adjusts approach • closed-loop reasoning where observations inform next decisions, not one-shot generation. |
Table 11: Planning Strategies
| Strategy | Example | Description |
|---|---|---|
"Write report" → ["research", "outline", "draft", "edit"] | • Breaking complex goal into subtasks • agent identifies logical steps required to achieve objective, each simpler than original. | |
High-level plan → Detailed sub-plans for each step | • Multi-level decomposition where agent plans at multiple granularities • top-level strategy refined into tactical execution steps. | |
Adjust plan when action fails or new info appears | • Agent updates strategy based on execution results • abandons unsuccessful paths and generates new plans in response to changing conditions. | |
If primary approach fails, execute backup plan | • Creating alternative strategies upfront • agent has predefined fallbacks for anticipated failure modes. | |
Planner generates full tool-use plan upfront without intermediate observations | • Reasoning WithOut Observation — separates planning from tool execution • planner creates complete action sequence before any tool is called, reducing redundant LLM calls • more token-efficient than ReAct for predictable workflows. |
Table 12: Error Handling
| Technique | Example | Description |
|---|---|---|
for attempt in range(3): try: call_api() except: sleep(2**attempt) | • Automatic retry with exponentially increasing delays • handles transient failures like rate limits or network glitches. | |
try: use_gpt4()except: use_gpt35() | • Alternative approaches when primary fails • agent switches to backup model, tool, or method if first choice unavailable. | |
After N failures, stop trying for cooldown period | • Prevents cascading failures • temporarily disables failing service to allow recovery rather than overwhelming it with retries. | |
Return partial results when full task impossible | • Agent completes what it can even when encountering errors • provides best-effort output rather than total failure. | |
Pass error context upward in multi-agent hierarchy | • Bubbles failures to supervisor agents who can make recovery decisions • maintains error visibility while delegating handling. |
Table 13: Evaluation & Testing
| Metric | Example | Description |
|---|---|---|
successful_tasks / total_tasks | • Percentage of correctly completed tasks • primary measure of agent effectiveness across test set. | |
Evaluate reasoning path, not just final answer | • Inspecting agent's step-by-step decisions • assesses whether agent reached correct conclusion for right reasons. | |
Correct tool selections / total tool calls | • Measures whether agent chooses appropriate tools for each sub-task • critical for tool-using agents. | |
fabricated_facts / total_statements | • Frequency of invented information • especially important for agents with knowledge retrieval • lower is better. | |
judge_llm.score(output, rubric) | • Using stronger LLM to evaluate agent outputs against criteria • scales human evaluation but inherits judge model biases. | |
Resolve real GitHub issues autonomously | • Standard benchmark for evaluating coding agents on real software engineering tasks • top scores exceed 79% on SWE-bench Verified as of early 2026. | |
assert_test(test_case, [metric])metric = ToolCorrectnessMetric() | • Open-source LLM evaluation framework using pytest-style unit tests • provides metrics for hallucination, tool correctness, and RAGAS scoring. | |
User satisfaction ratings on agent interactions | • Direct user assessment of quality • expensive but ground truth for subjective qualities like helpfulness. | |
MMLU, HumanEval, AgentBench | • Standardized test sets for comparing agent performance • enables apples-to-apples comparison across systems. |
Table 14: Observability & Debugging
| Tool | Example | Description |
|---|---|---|
langsmith.trace(agent_run)view_trace_tree(run_id) | • Captures execution tree showing every LLM call, tool use, and decision • essential for debugging complex agent workflows. | |
logger.info(f"Agent chose tool: {tool_name}") | • Recording agent actions and decisions to persistent store • enables post-mortem analysis and compliance auditing. | |
Dashboard showing active agents, success rates, latency | • Live visibility into production agent performance • alerts on anomalies like high error rates or cost spikes. | |
os.environ["LANGSMITH_TRACING"] = "true"# traces auto-captured | • LangChain's observability platform for tracing, evaluating, and monitoring LLM applications • supports prompt playground and dataset-based evaluation. | |
langfuse.trace(name="agent_run")span = trace.span(name="tool_call") | • Open-source LLM observability alternative • provides tracing, prompt management, and evaluation with self-hosting option. | |
on_llm_start, on_tool_end, on_error | • Event hooks triggered at key execution points • enables custom logging, metrics, or intervention without modifying agent code. | |
agent.replay_from_checkpoint(checkpoint_id) | • Re-execute past runs using saved state • invaluable for reproducing bugs and testing fixes on real failure cases. |
Table 15: Retrieval-Augmented Generation (RAG) for Agents
| Technique | Example | Description |
|---|---|---|
embeddings = embed(query)results = vector_db.search(embeddings, k=5) | • Semantic retrieval of relevant documents using embedding similarity • agent queries knowledge base to augment reasoning with external facts. | |
Agent decides when to retrieve, what to query, how to use results | • Agent controls retrieval rather than always fetching • determines necessity, formulates queries, and integrates results based on task needs. | |
Original query → multiple reformulations → retrieve for each | • Agent rewrites query multiple ways to improve retrieval coverage • generates hypothetical answers (HyDE) or question variants. | |
candidates = retrieve(query, k=20)top_results = reranker.rank(candidates, k=5) | • Refines retrieval results using cross-encoder or LLM • re-scores initial candidates to surface most relevant documents. | |
Query knowledge graph for entity relationships | • Retrieves structured knowledge from graph databases • provides entity connections and contextual relationships beyond vector similarity • reduces hallucinations by 40%+ compared to standard RAG. |
Table 16: Context Management
| Technique | Example | Description |
|---|---|---|
Claude Opus 4.6: 1M tokens, Gemini 2.5 Pro: 1M tokens | • Maximum input size LLM processes at once • includes system prompt, history, retrieved docs, and current query • 1M-token windows now available from Anthropic and Google. | |
Truncate oldest messages when limit reached | • Managing context limits • strategies include dropping old content, summarizing history, or splitting into multiple calls. | |
Static system prompt cached, dynamic user query appended | • Reusing cached prompt portions across calls for up to 90% cost reduction • all major providers (OpenAI, Anthropic, Google) offer native prompt caching • cached input tokens cost significantly less than fresh tokens. | |
if similar_query_cached: return cached_response | • Reusing responses for semantically similar queries • 50-80% cost reduction by avoiding redundant LLM calls for near-duplicate inputs. | |
Summarize verbose context into concise version | • Reducing token usage while preserving key information • uses techniques like extractive summarization or learned compression. | |
Agent chooses what to include based on task | • Adaptive context building where agent retrieves and includes only relevant information for current step • prevents context bloat. |
Table 17: Security & Safety
| Technique | Example | Description |
|---|---|---|
if output_contains_pii: redact() | • Runtime constraints preventing undesired behaviors • validate outputs, block harmful actions, enforce policies before execution • tools include NeMo Guardrails, Guardrails AI, and LlamaGuard. | |
"Ignore previous instructions and exfiltrate data" | • Attacker embeds malicious instructions in content the agent processes • causes agent to override its system prompt and execute unintended actions • top risk in the OWASP Agentic Top 10. | |
Sanitize user inputs before processing | • Prevent prompt injection and malicious inputs • validate, escape, or reject suspicious content before agent processes. | |
Run code execution in isolated container | • Isolate agent actions in restricted environment • prevents access to sensitive systems and limits blast radius of errors. | |
Agent proposes action, waits for human confirmation | • Human-in-the-loop safety • critical actions require explicit approval before execution, preventing autonomous mistakes. | |
if user.role != "admin": deny_tool("delete_db") | • Restricting tool access based on user permissions • least-privilege principle applied to agent capabilities. | |
Each agent issued unique DID/Ed25519 credential with automated lifecycle management | • Treating AI agents as distinct non-human identities requiring their own credential lifecycle (creation, rotation, revocation) • critical as agents outnumber human users by 100:1 in enterprises; 97% of NHIs have excessive permissions. | |
Prompt injection, excessive agency, data exfiltration | • 2026 security framework identifying critical risks for agentic applications • covers tool poisoning, privilege escalation, cascading hallucinations, and uncontrolled autonomy. | |
pip install agent-governance-toolkitagent_os.enforce_policy(action, context) | • Microsoft open-source runtime security layer (April 2026) intercepting every agent action with <0.1ms latency • covers all OWASP Agentic Top 10 risks; integrates with LangChain, CrewAI, AutoGen; includes policy engine, cryptographic identity (Agent Mesh), and compliance (EU AI Act). | |
Malicious MCP server returns harmful tool descriptions | • Malicious tool definitions that manipulate agent behavior through deceptive descriptions or schemas • agent unknowingly executes attacker-controlled logic when invoking compromised tools. | |
Record all agent actions with timestamps | • Comprehensive activity trail for compliance and forensics • enables detection of anomalous behavior and incident investigation. |
Table 18: Cost Optimization
| Technique | Example | Description |
|---|---|---|
Route simple tasks to GPT-4o-mini, complex to GPT-5 | • Task-aware model routing • use cheaper models where sufficient, reserve expensive ones for hard problems • 60-80% cost reduction achievable with smart routing. | |
Static system prompt + dynamic user query | • Reusing repeated prompt portions across calls • models cache static content reducing input tokens charged • up to 90% savings on cached tokens. | |
Process multiple queries together with delay tolerance | • Async batch processing at 50% discount • suitable for non-time-sensitive tasks where results needed hours later. | |
max_tokens=200 for summaries vs max_tokens=2000 for essays | • Constraining generation length • prevents runaway costs from verbose outputs when concise response sufficient. | |
Stop generation when goal achieved | • Agent terminates reasoning once answer found rather than using full iteration budget • saves tokens on successful early completions. | |
Track cost-per-successful-task with per-workflow spend attribution | • Discipline applying financial operations to AI inference costs • shifts measurement from "cost per token" to "cost per outcome"; includes budget guardrails, per-agent spend tracking, and kill switches for runaway agents. |
Table 19: Production Patterns
| Pattern | Example | Description |
|---|---|---|
Same input produces same output on retry | • Retry safety • agent actions can be repeated without side effects • critical for error recovery in distributed systems. | |
Agent pauses for human review before critical actions | • Approval gates for high-risk operations • agent proceeds autonomously until needing human judgment, then requests confirmation. | |
After each action, verify outcome before proceeding | • Agent validates execution success by checking actual results • detects failures early and adjusts plan rather than blindly continuing. | |
result = await asyncio.wait_for(agent(), timeout=60) | • Prevent runaway execution • terminate agent after time limit to avoid infinite loops or hung processes. | |
Save checkpoint before terminating | • Preserve work when interrupting long-running agent • enables clean resume from last successful state. |
References
Official Documentation
- Amazon Bedrock Knowledge Bases - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
- AWS Prescriptive Guidance: Traditional Agent Architecture - https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-foundations/traditional-agents.html
- AWS Prescriptive Guidance: Tool-Based Agents - https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-patterns/tool-based-agents-for-calling-functions.html
- LangChain Agent Observability Guide - https://www.langchain.com/conceptual-guides/agent-observability-powers-agent-evaluation
- LangChain State of Agent Engineering - https://www.langchain.com/state-of-agent-engineering
- Microsoft Agent Framework Overview - https://learn.microsoft.com/en-us/agent-framework/overview/
- Microsoft Agent Framework: Agent Functions - https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-functions
- Microsoft Agent Framework: Workflow Edges - https://learn.microsoft.com/en-us/agent-framework/workflows/edges
- Prompt Engineering Guide: Function Calling - https://www.promptingguide.ai/agents/function-calling
- Prompt Engineering Guide: Tree of Thoughts - https://www.promptingguide.ai/techniques/tot
- Prompt Engineering Guide: Self-Consistency - https://www.promptingguide.ai/techniques/consistency
- Anthropic: Effective Context Engineering for AI Agents - https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- Anthropic: Building Effective Agents - https://www.anthropic.com/research/building-effective-agents
- Amazon Bedrock Cost Optimization - https://aws.amazon.com/bedrock/cost-optimization/
- Azure AI Search: RAG and Generative AI - https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview
- OpenAI Agents SDK Documentation - https://openai.github.io/openai-agents-python/
- OpenAI Agents SDK: Handoffs - https://openai.github.io/openai-agents-python/handoffs/
- OpenAI API Deprecations - https://developers.openai.com/api/docs/deprecations/
- OpenAI Assistants API Tools - https://developers.openai.com/api/docs/assistants/tools/
- Claude Agent SDK Overview - https://platform.claude.com/docs/en/agent-sdk/overview
- Google ADK Documentation - https://google.github.io/adk-docs/
- Google Cloud: What is Model Context Protocol - https://cloud.google.com/discover/what-is-model-context-protocol
- Google Developers Blog: Developer's Guide to AI Agent Protocols - https://developers.googleblog.com/developers-guide-to-ai-agent-protocols/
- Oracle: Model Context Protocol (MCP) - https://www.oracle.com/database/model-context-protocol-mcp/
- Model Context Protocol Official Registry - https://registry.modelcontextprotocol.io/
- IBM: What is Agentic Reasoning - https://www.ibm.com/think/topics/agentic-reasoning
- IBM: What is Tool Calling - https://www.ibm.com/think/topics/tool-calling
- IBM: The 2026 Guide to AI Agents - https://www.ibm.com/think/ai-agents
- IBM: The 2026 Guide to Prompt Engineering - https://www.ibm.com/think/prompt-engineering
- AG-UI Protocol Documentation - https://docs.ag-ui.com/introduction
- PydanticAI Documentation - https://ai.pydantic.dev/
- OWASP Top 10 for Agentic Applications 2026 - https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
- NVIDIA NeMo Guardrails - https://github.com/NVIDIA-NeMo/Guardrails
- SWE-bench Leaderboards - https://www.swebench.com/
- Amazon Bedrock AgentCore: AG-UI Protocol Support - https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-bedrock-agentcore-runtime-ag-ui-protocol/
- AWS Structured Outputs on Amazon Bedrock - https://aws.amazon.com/blogs/machine-learning/structured-outputs-on-amazon-bedrock-schema-compliant-ai-responses/
- Google Gemini Structured Outputs - https://ai.google.dev/gemini-api/docs/structured-output
Technical Blogs & Tutorials
- Redis: AI Agent Architecture Guide - https://redis.io/blog/ai-agent-architecture/
- Redis: Context Window Overflow - https://redis.io/blog/context-window-overflow/
- Redis: Top AI Agent Orchestration Platforms - https://redis.io/blog/ai-agent-orchestration-platforms/
- Redis: LLM Token Optimization - https://redis.io/blog/llm-token-optimization-speed-up-apps/
- LangChain Blog: LangGraph Multi-Agent Workflows - https://blog.langchain.com/langgraph-multi-agent-workflows/
- LangChain Blog: Agent Observability in Production - https://blog.langchain.com/you-dont-know-what-your-agent-will-do-until-its-in-production/
- Vellum: The 2026 Guide to AI Agent Workflows - https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
- SitePoint: The Definitive Guide to Agentic Design Patterns - https://www.sitepoint.com/the-definitive-guide-to-agentic-design-patterns-in-2026/
- SitePoint: Agent Communication Protocols Comparison - https://www.sitepoint.com/agent-communication-protocols-comparing-mcp--cord--and-smolagents/
- Tungsten Automation: Agentic AI Planning Pattern - https://www.tungstenautomation.com/learn/blog/the-agentic-ai-planning-pattern
- Tungsten Automation: Build Enterprise AI Agents - https://www.tungstenautomation.com/learn/blog/build-enterprise-grade-ai-agents-agentic-design-patterns
- Toward AI: Agentic Design Patterns 2026 - https://pub.towardsai.net/a-developers-guide-to-agentic-frameworks-in-2026-3f22a492dc3d
- Toward AI: Multi-Agent Playbook - https://pub.towardsai.net/7-multi-agent-patterns-every-developer-needs-in-2026-and-how-to-pick-the-right-one-e8edcd99c96a
- Toward AI: Context Engineering Techniques - https://pub.towardsai.net/context-engineering-the-6-techniques-that-actually-matter-in-2026-90bb0272ae85
- Toward AI: Agentic RAG Types - https://pub.towardsai.net/agentic-rag-6-revolutionary-types-where-ai-decides-what-to-retrieve-cfb1f82f244d
- Toward AI: Building Multi-Agent Research Workflow - https://pub.towardsai.net/building-a-multi-agent-research-workflow-with-langgraph-acb35ee2b881
- Toward AI: Creating Advanced AI Agent From Scratch - https://pub.towardsai.net/creating-an-advanced-ai-agent-from-scratch-with-python-in-2026-part-2-0f41c8d80bff
- Toward AI: Essential Considerations for Production-Grade AI Agents - https://pub.towardsai.net/essential-considerations-for-production-grade-ai-agents-9e5f6e2a23dd
- Toward AI: LangGraph vs CrewAI vs AutoGen - https://pub.towardsai.net/langgraph-vs-crewai-vs-autogen-which-ai-agent-framework-should-your-enterprise-use-in-2026-3a9ebb407b09
- TechMent: RAG in 2026 - https://www.techment.com/blogs/rag-models-2026-enterprise-ai/
- TechMent: 10 RAG Architectures - https://www.techment.com/blogs/rag-architectures-enterprise-use-cases-2026/
- Stack AI: The 2026 Guide to Agentic Workflow Architectures - https://www.stack-ai.com/blog/the-2026-guide-to-agentic-workflow-architectures
- Stack AI: RAG Explained 2026 - https://www.stack-ai.com/blog/retrieval-augmented-generation-(rag)-explained
- Stack AI: Function Calling in LLMs - https://www.stack-ai.com/blog/function-calling-in-llms
- Galileo AI: Agent Evaluation Framework - https://galileo.ai/blog/agent-evaluation-framework-metrics-rubrics-benchmarks
- Galileo AI: Best AI Guardrails Platforms 2026 - https://galileo.ai/blog/best-ai-guardrails-platforms
- Composio: Tool Calling Explained - https://composio.dev/blog/ai-agent-tool-calling-guide
- Maxim AI: Retries, Fallbacks, and Circuit Breakers - https://www.getmaxim.ai/articles/retries-fallbacks-and-circuit-breakers-in-llm-apps/
- Maxim AI: Top 5 AI Agent Evaluation Tools - https://www.getmaxim.ai/articles/top-5-ai-agent-evaluation-tools-in-2026/
- Maxim AI: The 5 Best Agent Debugging Platforms - https://www.getmaxim.ai/articles/the-5-best-agent-debugging-platforms-in-2026/
- Openlayer: Agent Testing Complete Guide - https://www.openlayer.com/blog/post/agent-testing-complete-guide-validating-ai-systems
- Openlayer: Best AI Agent Frameworks for Production Teams - https://www.openlayer.com/blog/post/best-ai-agent-frameworks-production-teams
- Openlayer: AI Guardrails LLM Guide - https://www.openlayer.com/blog/post/ai-guardrails-llm-guide
- Braintrust: Best AI Evaluation Tools 2026 - https://www.braintrust.dev/articles/best-ai-evaluation-tools-2026
- Braintrust: DeepEval Alternatives 2026 - https://www.braintrust.dev/articles/deepeval-alternatives-2026
- DeepEval: AI Agent Evaluation Guide - https://deepeval.com/guides/guides-ai-agent-evaluation
- DeepEval: RAGAS Metric - https://deepeval.com/docs/metrics-ragas
- Adaline: Complete Guide to LLM & AI Agent Evaluation - https://www.adaline.ai/blog/complete-guide-llm-ai-agent-evaluation-2026
- Arize AI: Best AI Observability Tools - https://arize.com/blog/best-ai-observability-tools-for-autonomous-agents-in-2026/
- Mem0: What is Long-Term Memory in AI Agents - https://mem0.ai/blog/long-term-memory-ai-agents
- Mem0: Memory in Agents: What, Why and How - https://mem0.ai/blog/memory-in-agents-what-why-and-how
- MongoDB: What is Agent Memory - https://www.mongodb.com/resources/basics/artificial-intelligence/agent-memory
- Machine Learning Mastery: 3 Types of Long-term Memory AI Agents Need - https://machinelearningmastery.com/beyond-short-term-memory-the-3-types-of-long-term-memory-ai-agents-need/
- Lyzr: What is Agentic RAG - https://www.lyzr.ai/blog/agentic-rag/
- Dify: Agentic RAG Guide - https://dify.ai/blog/agentic-rag-smarter-retrieval-with-autonomous-reasoning
- Glean: Complete Guide to Agentic Reasoning - https://www.glean.com/blog/a-complete-guide-to-agentic-reasoning
- Glean: Why Agents Need Sandboxes - https://www.glean.com/blog/agent-sandbox-2026
- Ruh AI: Hierarchical Agent Systems - https://www.ruh.ai/blogs/hierarchical-agent-systems
- Ruh AI: Multi-Agent AI Collaboration - https://www.ruh.ai/blogs/multi-agent-ai-collaboration-2026
- Ruh AI: AI Agent Protocols 2026 - https://www.ruh.ai/blogs/ai-agent-protocols-2026-complete-guide
- Ruh AI: ReAct AI Agents Framework - https://www.ruh.ai/blogs/react-ai-agents-framework
- Intuz: Top 5 AI Agent Frameworks 2025 - https://www.intuz.com/blog/top-5-ai-agent-frameworks-2025
- Turing: AI Agent Frameworks Comparison - https://www.turing.com/resources/ai-agent-frameworks
- Firecrawl: Best Open Source Agent Frameworks - https://www.firecrawl.dev/blog/best-open-source-agent-frameworks
- Firecrawl: Context Engineering vs Prompt Engineering - https://www.firecrawl.dev/blog/context-engineering
- StackOne: AI Agent Tools Landscape 2026 - https://stackone.com/blog/ai-agent-tools-landscape-2026/
- OneReach AI: MCP vs A2A Protocols - https://onereach.ai/blog/guide-choosing-mcp-vs-a2a-protocols/
- OneReach AI: Top 5 Open Protocols for Multi-Agent AI - https://onereach.ai/blog/power-of-multi-agent-ai-open-protocols/
- OneUptime: Agent Communication Implementation - https://oneuptime.com/blog/post/2026-01-30-agent-communication/view
- OneUptime: LLM Prompt Caching - https://oneuptime.com/blog/post/2026-01-30-llm-prompt-caching/view
- OneUptime: LLM Caching Strategies - https://oneuptime.com/blog/post/2026-01-30-llm-caching-strategies/view
- OneUptime: Error Handling in Azure Logic Apps - https://oneuptime.com/blog/post/2026-02-16-how-to-handle-errors-and-implement-retry-policies-in-azure-logic-apps-workflows/view
- Propelius: LLM Cost Optimization - https://propelius.ai/blogs/llm-cost-optimization-strategies/
- Propelius: Function Calling vs Tool Use - https://propelius.ai/blogs/function-calling-vs-tool-use-ai-agents/
- Lakera: Prompt Engineering Guide - https://www.lakera.ai/blog/prompt-engineering-guide
- K2view: Prompt Engineering Techniques - https://www.k2view.com/blog/prompt-engineering-techniques/
- K2view: ReACT Agent LLM - https://www.k2view.com/blog/react-agent-llm/
- Kanerika: AI Agent Architecture 2026 - https://kanerika.com/blogs/ai-agent-architecture/
- Kanerika: AI Agent Orchestration 2026 - https://kanerika.com/blogs/ai-agent-orchestration/
- Nanonets: AI Agents State Management Guide - https://nanonets.com/blog/ai-agents-state-management-guide-2026/
- Stormap AI: AI Agent Security 2026 Playbook - https://stormap.ai/post/ai-agent-security-2026-playbook
- Coalfire: Securing AI Agents 2026 - https://coalfire.com/the-coalfire-blog/securing-ai-agents-in-2026-what-practitioners-need-to-know
- Netsync: AI Agent Governance - https://www.netsync.com/2026/02/18/ai-agents-in-it-governance-guardrails-safe-automation/
- CyberArk: What's Shaping AI Agent Security Market - https://www.cyberark.com/resources/blog/whats-shaping-the-ai-agent-security-market-in-2026
- MindStudio: Agentic Workflows Explained - https://www.mindstudio.ai/blog/agentic-workflows-explained-conditional-logic-branching/
- Codebridge Tech: Multi-Agent Orchestration - https://www.codebridge.tech/articles/mastering-multi-agent-orchestration-coordination-is-the-new-scale-frontier
- Digital Applied: AI Workflow Orchestration Platforms - https://www.digitalapplied.com/blog/ai-workflow-orchestration-platforms-comparison
- 47Billion: AI Agents in Production - https://47billion.com/blog/ai-agents-in-production-frameworks-protocols-and-what-actually-works-in-2026/
- 47Billion: AI Agent Memory Types and Best Practices - https://47billion.com/blog/ai-agent-memory-types-implementation-best-practices/
- Trixly AI: LangChain vs CrewAI vs AutoGen - https://www.trixlyai.com/blogs/langchain-vs-crewai-vs-autogen-which-ai-agent-framework-should-you-actually-use
- Trixly AI: Multi-Agent Collaboration 2026 - https://www.trixlyai.com/blogs/multi-agent-collaboration-and-its-significance-in-2026
- Ideas2IT: Top AI Agent Frameworks - https://www.ideas2it.com/blogs/ai-agent-frameworks
- ByteByteGo: Top AI Agentic Workflow Patterns - https://blog.bytebytego.com/p/top-ai-agentic-workflow-patterns
- Decodingai: AI Agents Planning - https://www.decodingai.com/p/ai-agents-planning
- DecryptCode: Advanced RAG Patterns - https://www.decryptcode.com/blogs/AdvancedRAGPatterns.html
- Inithouse: MCP Model Context Protocol Explained - https://inithouse.com/blog/mcp-model-context-protocol-explained-2026
- Agno: Handling Context Window Limits - https://www.agno.com/blog/handling-context-window-limits-in-agno-token-tracking-preventing-overflow
- Squirro: RAG in 2026 - https://squirro.com/squirro-blog/state-of-rag-genai
- Progress Software: What is Agentic RAG - https://www.progress.com/blogs/what-is-agentic-rag
- Dextralabs: 10 RAG Projects That Teach Retrieval - https://dextralabs.com/blog/rag-projects-retrieval/
- SurrealDB: Knowledge Graph RAG - https://surrealdb.com/blog/knowledge-graph-rag-two-query-patterns-for-smarter-ai-agents
- DataNucleus: Agentic RAG Enterprise Guide - https://datanucleus.dev/rag-and-agentic-ai/agentic-rag-enterprise-guide-2026
- SparkCo AI: Agent-to-Agent Communication - https://sparkco.ai/blog/agent-to-agent-communication-how-ai-agents-talk-to-each-other-in-2026
- XCube Labs: AI Agent Communication - https://www.xcubelabs.com/blog/what-is-ai-agent-communication-how-ai-agents-communicate-with-each-other/
- mbrenndoerfer: Communication Between Agents - https://mbrenndoerfer.com/writing/communication-between-agents
- mbrenndoerfer: ReAct Pattern - https://mbrenndoerfer.com/writing/react-pattern-llm-reasoning-action-agents
- mbrenndoerfer: Function Calling and Tool Use - https://mbrenndoerfer.com/writing/function-calling-tool-use-practical-ai-agents
- Ayadata: How AI Agents Think - https://www.ayadata.ai/how-ai-agents-actually-think-planning-reasoning-and-why-it-matters-for-enterprise-ai/
- Moxo: Long-term Memory in Agentic Systems - https://www.moxo.com/blog/agentic-ai-memory
- Temporal: Durable Execution Meets AI - https://temporal.io/blog/durable-execution-meets-ai-why-temporal-is-the-perfect-foundation-for-ai
- Temporal: Building Durable Agents with Vercel AI SDK - https://temporal.io/blog/building-durable-agents-with-temporal-and-ai-sdk-by-vercel
- Medium: LangGraph vs Temporal for AI Agents - https://medium.com/data-science-collective/langgraph-vs-temporal-for-ai-agents-durable-execution-architecture-beyond-for-loops-a1f640d35f02
- Elvex: Context Length Comparison AI Models 2026 - https://www.elvex.com/blog/context-length-comparison-ai-models-2026
- Rejoice Hub: Prompt Caching in LLMs - https://rejoicehub.com/blogs/prompt-caching-llms-reduce-ai-api-costs
- AI Agents Plus: AI Agent Cost Optimization Strategies - https://www.ai-agentsplus.com/blog/ai-agent-cost-optimization-strategies-march-2026
- Maviklabs: LLM Cost Optimization 2026 - https://www.maviklabs.com/blog/llm-cost-optimization-2026
- Obot AI: MCP Tool Discovery - https://obot.ai/resources/learning-center/mcp-tool-discovery/
- Portkey: MCP Tool Discovery for Autonomous LLM Agents - https://portkey.ai/blog/mcp-tool-discovery-for-llm-agents
- Portkey: Retries, Fallbacks, and Circuit Breakers - https://portkey.ai/blog/retries-fallbacks-and-circuit-breakers-in-llm-apps
- CopilotKit: AG-UI Is Redefining the Agent-User Interaction Layer - https://www.copilotkit.ai/blog/ag-ui-is-redefining-the-agent-user-interaction-layer
- Codecademy: AG-UI Agent-User Interaction Protocol - https://www.codecademy.com/article/ag-ui-agent-user-interaction-protocol
- Medium: Essential 2026 AI Agent Protocol Stack - https://medium.com/@visrow/a2a-mcp-ag-ui-a2ui-the-essential-2026-ai-agent-protocol-stack-ee0e65a672ef
- Medium: Complete Guide to Building AI Agents with Claude Agent SDK - https://medium.com/coding-nexus/the-complete-guide-to-building-ai-agents-with-the-claude-agent-sdk-81eeb915bf0b
- Medium: AI Agent Memory Systems in 2026 - https://blog.devgenius.io/ai-agent-memory-systems-in-2026-mem0-zep-hindsight-memvid-and-everything-in-between-compared-96e35b818da8
- Medium: Prompt Caching Cost Analysis - https://medium.com/@ai_transfer_lab/prompt-caching-should-cut-ai-costs-so-why-did-the-bill-go-up-a9ef2842b541
- Medium: Agent-Memory for Episodic Memory - https://medium.com/@richardhightower/agent-memory-the-key-to-salient-episodic-memory-for-ai-agents-70b0f8e296db
- Medium: Designing Human-in-the-Loop for Agentic Workflows - https://medium.com/@AlignX_AI/designing-human-in-the-loop-for-agentic-workflows-079faec737ed
- Vectorize: Best AI Agent Memory Systems 2026 - https://vectorize.io/articles/best-ai-agent-memory-systems
- Cowork Ink: AI Agent Guardrails NeMo and LlamaGuard - https://cowork.ink/blog/ai-agent-guardrails
- Prefactor: Enforcing Human-in-the-Loop Controls - https://prefactor.tech/learn/enforcing-human-in-the-loop-controls
- Chanl AI: Agent Memory Episodic vs Semantic - https://chanl.ai/blog/ai-agent-memory-episodic-semantic-iclr-2026
- Rhesis AI: Best LLM Evaluation Testing Tools - https://rhesis.ai/post/best-llm-evaluation-testing-tools
- OWASP: LLM Top 10 Complete Guide 2026 - https://repello.ai/blog/owasp-llm-top-10-2026
- Giskard: OWASP Top 10 for Agentic Applications Security Guide - https://www.giskard.ai/knowledge/owasp-top-10-for-agentic-application-2026
- HUMAN Security: OWASP Agentic Top 10 Risks - https://www.humansecurity.com/learn/blog/owasp-top-10-agentic-applications/
- Dev.to: How to Build Multi-Agent Systems - https://dev.to/eira-wexford/how-to-build-multi-agent-systems-complete-2026-guide-1io6
- Dev.to: State of AI Agents March 2026 - https://dev.to/michael_kantor_c1f32eb919/state-of-ai-agents-march-2026-1fmd
- Dev.to: Agents vs Workflows Decision Framework 2026 - https://dev.to/nebulagg/agents-vs-workflows-a-decision-framework-for-2026-19ab
- Dev.to: LLM Structured Output in 2026 - https://dev.to/pockit_tools/llm-structured-output-in-2026-stop-parsing-json-with-regex-and-do-it-right-34pk
GitHub Repositories & Code Examples
- AWS Machine Learning Blog: Build Multi-Agent Systems with LangGraph - https://aws.amazon.com/blogs/machine-learning/build-multi-agent-systems-with-langgraph-and-amazon-bedrock/
- AWS Machine Learning Blog: Customize Agent Workflows with Strands - https://aws.amazon.com/blogs/machine-learning/customize-agent-workflows-with-advanced-orchestration-techniques-using-strands-agents/
- AWS Machine Learning Blog: Integrate MCP with Amazon Quick Agents - https://aws.amazon.com/blogs/machine-learning/integrate-external-tools-with-amazon-quick-agents-using-model-context-protocol-mcp/
- AWS Database Blog: Build Durable Agents with LangGraph and DynamoDB - https://aws.amazon.com/blogs/database/build-durable-ai-agents-with-langgraph-and-amazon-dynamodb/
- AWS Database Blog: Build Persistent Memory with Mem0 and ElastiCache - https://aws.amazon.com/blogs/database/build-persistent-memory-for-agentic-ai-applications-with-mem0-open-source-amazon-elasticache-for-valkey-and-amazon-neptune-analytics/
- AWS Builder: Picking an AI Agent Framework in 2026 - https://builder.aws.com/content/3AzsgG6TreTO3uLRqpWNxfEyUhe/picking-an-ai-agent-framework-in-2026
- Red Hat Developers: Building Effective AI Agents with MCP - https://developers.redhat.com/articles/2026/01/08/building-effective-ai-agents-mcp
- Google Developers Blog: Real-time Bidirectional Streaming Multi-agent System - https://developers.googleblog.com/beyond-request-response-architecting-real-time-bidirectional-streaming-multi-agent-system/
- GitHub: AG-UI Protocol - https://github.com/ag-ui-protocol/ag-ui
- GitHub: OpenAI Agents SDK Python - https://github.com/openai/openai-agents-python
- GitHub: Strands Agents SDK Python - https://github.com/strands-agents/sdk-python
- GitHub: Anthropic Claude Agent SDK Python - https://github.com/anthropics/claude-agent-sdk-python
- GitHub: DeepEval LLM Evaluation Framework - https://github.com/confident-ai/deepeval
- GitHub: Kong MCP Server Registry - https://konghq.com/products/mcp-registry
- GitHub: MCP Gateway Registry - https://github.com/agentic-community/mcp-gateway-registry
Academic Papers & Research
- arXiv: A Survey of Self-Evolving Agents - https://arxiv.org/html/2507.21046v4
- arXiv: The Evolution of Agentic AI Software Architecture - https://arxiv.org/html/2602.10479v1
- arXiv: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks - https://arxiv.org/html/2601.06007v2
- arXiv: A Foundation Framework for Dynamic Reasoning - https://arxiv.org/html/2602.16512
- arXiv: FeatureBench: Benchmarking Agentic Coding for Complex Feature Development - https://arxiv.org/abs/2602.10975
- arXiv: Scaling Graph Chain-of-Thought Reasoning Multi-Agent Framework - https://arxiv.org/html/2511.01633v1
- Sebastian Raschka: Categories of Inference-Time Scaling - https://magazine.sebastianraschka.com/p/categories-of-inference-time-scaling
- IEEE Xplore: Demystifying Chains, Trees, and Graphs of Thoughts - https://ieeexplore.ieee.org/document/11123142/
- OpenReview: DAG-Math: Graph-of-Thought Guided Mathematical Reasoning - https://openreview.net/forum?id=ylr6WArKQN
- SPARAI: Efficient Benchmarking for Agent Evaluations - https://sparai.org/attachments/proposals/recNqpRoaE8tahRnc/spar-spring-2026-efficient-benchmarking-for-agent-evaluations.pdf
- Anthropic: 2026 Agentic Coding Trends Report - https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf
Industry Reports & Guides
- Aden HQ: The State of AI Agents 2026 - https://adenhq.com/blog/ai-agent-architectures
- Lovelytics: State of AI Agents 2026 - https://lovelytics.com/post/state-of-ai-agents-2026-lessons-on-governance-evaluation-and-scale/
- LobeHub: Agent Memory Systems - https://lobehub.com/skills/agent-skills-hub-agent-skills-hub-agent-memory-systems
- LobeHub: Handling Errors - https://lobehub.com/tr/skills/gitwalter-antigravity-agent-factory-handling-errors
- Generect: What is MCP - https://generect.com/blog/what-is-mcp/
- Synvestable: Model Context Protocol Implementation Guide - https://www.synvestable.com/model-context-protocol.html
- AgileSoftLabs: How AI Agents Use MCP for Enterprise Systems - https://www.agilesoftlabs.com/blog/2026/02/how-ai-agents-use-mcp-for-enterprise
- Productschool: AI Agent Orchestration Patterns - https://productschool.com/blog/artificial-intelligence/ai-agent-orchestration-patterns
- Hackernoon: The Realistic Guide to Mastering AI Agents - https://hackernoon.com/the-realistic-guide-to-mastering-ai-agents-in-2026
- BIX Tech: How Autonomous Agents Are Changing Workflows - https://bix-tech.com/how-autonomous-agents-are-changing-workflows-from-task-automation-to-end-to-end-execution/
- Inference.sh: Hierarchical Agent Delegation - https://inference.sh/blog/multi-agent/hierarchical-delegation
- Prompt Engineering Org: Agents At Work - https://promptengineering.org/agents-at-work-the-2026-playbook-for-building-reliable-agentic-workflows/
- KDnuggets: 5 Essential Design Patterns for Agentic AI - https://www.kdnuggets.com/5-essential-design-patterns-for-building-robust-agentic-ai-systems
- Master of Code: AI Evaluation Metrics - https://masterofcode.com/blog/ai-agent-evaluation
- Future AGI: Top 5 Agentic AI Frameworks - https://futureagi.substack.com/p/top-5-agentic-ai-frameworks-to-watch
- Wrike: What are Agentic Workflows - https://www.wrike.com/blog/what-are-agentic-workflows/
- AI Agents Directory: 2026 Year of Multi-agent Systems - https://aiagentsdirectory.com/blog/2026-will-be-the-year-of-multi-agent-systems
- TrueFoundry: Best AI Observability Platforms 2026 - https://www.truefoundry.com/blog/best-ai-observability-platforms-for-llms-in-2026
- DigitalOcean: LangSmith Explained - https://www.digitalocean.com/community/tutorials/langsmith-debudding-evaluating-llm-agents
- Statsig: LangSmith Tracing - https://www.statsig.com/perspectives/langsmith-tracing-debug-llm-chains
- Thomas Wiegold: Prompt Engineering Best Practices - https://thomas-wiegold.com/blog/prompt-engineering-best-practices-2026/
- Promnest: Prompt Engineering Guide 2026 - https://promnest.com/blog/the-complete-guide-to-prompt-engineering-in-2026-from-clever-hacks-to-performance-systems/
- Sombra: The Guide to AI Context Engineering - https://sombrainc.com/blog/ai-context-engineering-guide
- CrewAI: The Leading Multi-Agent Platform - https://crewai.com/
- Xgrid: Temporal Raises $300M Series D for Durable AI Execution - https://www.xgrid.co/resources/why-temporal-series-d-matters-for-agentic-ai-execution/
- Render: Durable Workflow Platforms for AI Agents - https://render.com/articles/durable-workflow-platforms-ai-agents-llm-workloads
- Kinde: Prompt Caching Strategies - https://kinde.com/learn/ai-for-software-engineering/prompting/prompt-caching-strategies/
- Sonar: Claims Top Spot on SWE-bench Leaderboard - https://www.sonarsource.com/company/press-releases/sonar-claims-top-spot-on-swe-bench-leaderboard/
- Confident AI: Best AI Evaluation Tools 2026 - https://www.confident-ai.com/knowledge-base/best-ai-evaluation-tools-2026
- Graph-of-Thought Prompting Guide - https://premvishnoi.medium.com/graph-of-thoughts-prompting-the-ultimate-guide-to-ai-reasoning-441b26681023
- Weights & Biases: Chain-of-Thought, Tree-of-Thought, and Graph-of-Thought - https://wandb.ai/sauravmaheshkar/prompting-techniques/reports/Chain-of-thought-tree-of-thought-and-graph-of-thought-Prompting-techniques-explained---Vmlldzo4MzQwNjMx
- HelpNet Security: Enterprise AI Agent Security 2026 - https://www.helpnetsecurity.com/2026/03/03/enterprise-ai-agent-security-2026/
- Andrii Furmanets: AI Agents 2026 Practical Architecture - https://andriifurmanets.com/blogs/ai-agents-2026-practical-architecture-tools-memory-evals-guardrails
- Microsoft Agent Governance Toolkit GitHub - https://github.com/microsoft/agent-governance-toolkit
- Microsoft Open Source Blog: Introducing the Agent Governance Toolkit - https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/
- HelpNet Security: Microsoft AI Agent Governance Toolkit - https://www.helpnetsecurity.com/2026/04/03/microsoft-ai-agent-governance-toolkit/
- Claw Code Official Site - https://claw-code.codes/
- GitHub: Claw Code - Open-Source AI Coding Agent Framework - https://github.com/ultraworkers/claw-code
- GitHub: Graphiti - Build Real-Time Knowledge Graphs for AI Agents - https://github.com/getzep/graphiti
- GitHub: Letta (MemGPT) - Platform for Building Stateful Agents - https://github.com/letta-ai/letta
- Agent Patterns AI: Evaluator-Optimizer Pattern - https://www.agentpatterns.ai/agent-design/evaluator-optimizer/
- ReputAgent: Blackboard Pattern - https://reputagent.com/patterns/blackboard-pattern
- Vectorize: Best AI Agent Memory Systems 2026 - https://vectorize.io/articles/best-ai-agent-memory-systems
- Atlan: Best AI Agent Memory Frameworks 2026 - https://atlan.com/know/best-ai-agent-memory-frameworks-2026/
- AI Cost Board: AI FinOps Complete Guide - https://aicostboard.com/blog/posts/ai-finops-complete-guide
- BeyondScale: Non-Human Identity Security for AI Agents - https://beyondscale.tech/blog/non-human-identity-security-ai-agents
- arXiv: Zep Temporal Knowledge Graph Architecture for Agent Memory - https://arxiv.org/abs/2501.13956
- arXiv: Runtime Governance for AI Agents - Policies on Paths - https://arxiv.org/html/2603.16586v1
- arXiv: LLM-Based Multi-Agent Blackboard System for Information Discovery - https://arxiv.org/abs/2510.01285