Chroma Vector Database Cheat Sheet

Updated 2026-05-21

Next Topic: Claude (Anthropic) Cheat Sheet

Chroma is an open-source, AI-native vector database designed for storing, searching, and managing embeddings alongside metadata, documents, and rich filtering. It sits at the heart of retrieval-augmented generation (RAG) pipelines, giving LLMs long-term memory and semantic search over private data. Chroma runs in-memory for rapid prototyping, on disk for local persistence, or as a remote HTTP server and fully managed Chroma Cloud for production scale — all under the same Python and JavaScript API. The key mental model is that a collection acts like a smart table: every record holds an ID, an embedding vector, optional metadata key-value pairs, and an optional document string, and everything from insert to nearest-neighbor search runs through that single consistent shape.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 126 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Client Types and InitializationTable 2: Collection ManagementTable 3: Adding and Updating DataTable 4: Querying CollectionsTable 5: Metadata Filtering Operators (where)Table 6: Document Filtering Operators (where_document)Table 7: Embedding FunctionsTable 8: HNSW Index ConfigurationTable 9: Distance FunctionsTable 10: Chroma Cloud and Managed OfferingTable 11: Multi-TenancyTable 12: Server Deployment and AuthenticationTable 13: LangChain IntegrationTable 14: LlamaIndex IntegrationTable 15: Multimodal CollectionsTable 16: chroma-mcp (MCP Server Integration)Table 17: Gotchas, Pitfalls, and Best Practices

Table 1: Client Types and Initialization

Choosing the right Chroma client is the first decision in every project — it determines where data lives, whether it persists, and how many processes can share it. Each client type presents the identical collection API, so switching from development to production requires only changing the client constructor.

Type	Example	Description
EphemeralClient	`import chromadb` `client = chromadb.EphemeralClient()`	In-memory only client; data is lost when the process exits. • Ideal for tests and rapid prototyping • No disk I/O overhead
PersistentClient	`client = chromadb.PersistentClient(` `path="./chroma_db")`	Writes to disk at the given `path`; data survives restarts. Default choice for local development.
HttpClient	`client = chromadb.HttpClient(` `host="localhost", port=8000)`	Connects to a separately running Chroma server (HTTP); enables multi-process and multi-client access. Recommended for production self-hosting.

Table 1: Client Types and Initialization

Type	Example	Description
EphemeralClient	`import chromadb` `client = chromadb.EphemeralClient()`	In-memory only client; data is lost when the process exits. • Ideal for tests and rapid prototyping • No disk I/O overhead
PersistentClient	`client = chromadb.PersistentClient(` `path="./chroma_db")`	Writes to disk at the given `path`; data survives restarts. Default choice for local development.
HttpClient	`client = chromadb.HttpClient(` `host="localhost", port=8000)`	Connects to a separately running Chroma server (HTTP); enables multi-process and multi-client access. Recommended for production self-hosting.