Milvus (Vector Database) Cheat Sheet

Updated 2026-05-21

Next Topic: Mistral AI Models Cheat Sheet

Milvus is an open-source, cloud-native vector database built by Zilliz that stores, indexes, and searches high-dimensional embedding vectors at billion-scale. It powers AI applications — RAG pipelines, semantic search, recommendation systems, and multimodal retrieval — by turning similarity search into a first-class database operation. Unlike bolted-on vector extensions, Milvus separates compute from storage and uses a message-queue-backed write path (Pulsar/Kafka) so search nodes scale independently of data nodes. The key mental model: everything flows through collections → shards → segments, and the right index type plus the right consistency level unlock both accuracy and throughput simultaneously.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 112 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Data Model — Collections, Partitions, Shards, and SegmentsTable 2: Schema Field TypesTable 3: In-Memory CPU Index TypesTable 4: On-Disk and GPU Index TypesTable 5: Sparse and Full-Text SearchTable 6: Hybrid and Multi-Vector SearchTable 7: Similarity MetricsTable 8: Search and Query OperationsTable 9: Index Build and Search ParametersTable 10: Consistency LevelsTable 11: Deployment ModesTable 12: Multi-Tenancy StrategiesTable 13: RBAC and SecurityTable 14: Data Ingestion and ImportTable 15: Ecosystem IntegrationsTable 16: Attu Management UITable 17: Observability and Operations

Table 1: Core Data Model — Collections, Partitions, Shards, and Segments

A Milvus collection is the top-level container for vectors and their associated scalar fields — analogous to a table in a relational database. Understanding how collections subdivide into partitions, shards, and segments is essential before tuning performance or planning capacity.

Concept	Example	Description
Collection	`client.create_collection(collection_name="docs", dimension=768)`	Top-level data container holding a schema (fields + vector field), up to 65,535 collections per instance.
Shard	`num_shards=2` in `create_collection()`	• Horizontal unit for write scaling • primary key is hashed to route inserts across shards • 1–2 shards per 50–200M entities is the recommended range • immutable after creation.
Partition	`client.create_partition(collection_name="docs", partition_name="2024")`	• Logical read-time subdivision within a collection • queries can skip irrelevant partitions entirely to reduce search footprint • up to 1,024 partitions per collection
Partition Key	`schema.add_field("tenant_id", DataType.VARCHAR, is_partition_key=True)`	• Designates a scalar field so Milvus auto-manages partitions by hashing field values • ideal for multi-tenancy with millions of tenants • eliminates manual partition management
Segment	(internal — not user-created)	• Smallest execution unit • intersection of shard and partition • Growing (buffering writes, unindexed) or Sealed (immutable, indexed) • default max size ~122 MB before sealing

Table 1: Core Data Model — Collections, Partitions, Shards, and Segments

Concept	Example	Description
Collection	`client.create_collection(collection_name="docs", dimension=768)`	Top-level data container holding a schema (fields + vector field), up to 65,535 collections per instance.
Shard	`num_shards=2` in `create_collection()`	• Horizontal unit for write scaling • primary key is hashed to route inserts across shards • 1–2 shards per 50–200M entities is the recommended range • immutable after creation.
Partition	`client.create_partition(collection_name="docs", partition_name="2024")`	• Logical read-time subdivision within a collection • queries can skip irrelevant partitions entirely to reduce search footprint • up to 1,024 partitions per collection
Partition Key	`schema.add_field("tenant_id", DataType.VARCHAR, is_partition_key=True)`	• Designates a scalar field so Milvus auto-manages partitions by hashing field values • ideal for multi-tenancy with millions of tenants • eliminates manual partition management
Segment	(internal — not user-created)	• Smallest execution unit • intersection of shard and partition • Growing (buffering writes, unindexed) or Sealed (immutable, indexed) • default max size ~122 MB before sealing