GraphRAG is an advanced retrieval-augmented generation paradigm that combines knowledge graphs with large language models to address the limitations of standard vector-based RAG. Unlike traditional RAG, which retrieves text chunks via semantic similarity, GraphRAG extracts a structured knowledge graph from documents—capturing entities, relationships, and communities—then uses graph traversal and community summaries to power both local (entity-focused) and global (dataset-wide) reasoning. This enables multi-hop inference, explainable provenance, and improved accuracy on complex queries where answers live in connections, not content. Key to GraphRAG's value is its two-stage architecture: an indexing pipeline that constructs the graph (entity extraction → relationship detection → community clustering → summary generation), and a retrieval pipeline that traverses or queries the graph at inference time. Trade-offs include higher indexing costs (10–100x token usage vs. vanilla RAG) and increased latency, but where relational reasoning matters—finance, healthcare, legal compliance—GraphRAG consistently outperforms embedding-only approaches by 35–46% on multi-hop benchmarks.
What This Cheat Sheet Covers
This topic spans 25 focused tables and 178 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core GraphRAG Concepts
GraphRAG fundamentally reimagines retrieval by replacing flat semantic search with structured graph reasoning. Understanding these foundational concepts clarifies why GraphRAG excels at relationship-driven queries where standard RAG fails.
| Concept | Example | Description |
|---|---|---|
Microsoft's approach: extract entities → build hierarchy → generate summaries → query via map-reduce | RAG paradigm that uses knowledge graphs instead of vector embeddings for retrieval; enables multi-hop reasoning and explainable answers | |
Nodes = Person, Organization; Edges = WORKS_FOR, INVESTED_IN | Structured representation of data as entities (nodes) connected by relationships (edges); captures semantics beyond flat text | |
LLM extracts "John Smith, CEO, TechCorp" → nodes Person(John Smith), Organization(TechCorp), edge ROLE_AT | Process of identifying named entities (people, places, concepts) from unstructured text; forms graph nodes | |
From text: "Alice hired Bob" → triple (Alice, HIRED, Bob) | Detecting semantic connections between entities; forms graph edges; can be LLM-based or NLP rule-based | |
Leiden algorithm clusters related entities into communities | Graph clustering to group densely connected entities; enables hierarchical summarization at scale | |
LLM generates: "Community 7 focuses on AI safety research, key members: Anthropic, OpenAI..." | Abstract of a detected community; generated by LLM from entities/relationships; powers global search |