LlamaIndex is an open-source data orchestration framework for building production-ready LLM applications, specializing in retrieval-augmented generation (RAG) and agentic document workflows. It connects private or domain-specific data to large language models through sophisticated indexing, retrieval, and query mechanisms. Unlike general-purpose orchestration frameworks, LlamaIndex prioritizes data ingestion pipelines, advanced retrieval strategies, and context engineering — making it the go-to choice when your application's success hinges on how well you retrieve and structure information before passing it to an LLM. The framework treats documents as first-class citizens, offering deep control over chunking, embedding, metadata extraction, hierarchical relationships, multi-step retrieval patterns, agentic workflows, and MCP integration — essential for knowledge-intensive production applications.
What This Cheat Sheet Covers
This topic spans 29 focused tables and 195 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Index Types
The index you choose decides how your data is structured and, in turn, how it gets retrieved — a vector index for semantic similarity, a tree or summary index for whole-document summarization, a property graph for relational reasoning. Picking the right one up front is the single biggest lever on retrieval quality, so it's worth knowing what each is good at before you commit a corpus to it.
| Index | Example | Description |
|---|---|---|
index = VectorStoreIndex.from_documents(docs) | • Stores vector embeddings of document chunks • retrieves via similarity search (cosine, Euclidean) • most common index for semantic retrieval in RAG. | |
index = SummaryIndex.from_documents(docs) | • Stores nodes as a sequential chain with no complex structure • retrieves all nodes or filters by keywords • formerly called ListIndex. | |
index = DocumentSummaryIndex.from_documents(docs) | • Extracts a summary per document and stores it alongside nodes • retrieves by matching query to document summaries first, then fetches relevant nodes. | |
index = PropertyGraphIndex.from_documents(docs) | • Creates a knowledge graph with entities and relationships • supports Cypher queries, hybrid search, and graph-based retrieval for complex relational data. |