Weaviate is an open-source, cloud-native vector database that stores data objects together with their vector embeddings, enabling both semantic (vector) and keyword (BM25) search in a single system. It sits at the center of modern RAG (Retrieval-Augmented Generation) pipelines, combining retrieval with integrated generative AI modules so that search and generation happen in one query. Unlike standalone vector stores, Weaviate ships with a full schema system, inverted indexes, multi-tenancy, replication, and a pluggable module ecosystem β making the key mental model that each "collection" in Weaviate is simultaneously a vector index, an inverted index, and a structured object store.
What This Cheat Sheet Covers
This topic spans 17 focused tables and 166 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Collection (Class) Definition β Top-Level Parameters
A Weaviate collection (formerly called a "class") is the core schema unit that defines how objects are stored, vectorized, and indexed. Every collection must be planned carefully before import because several top-level settings β vectorizer, index type, sharding, and multi-tenancy β are immutable after creation.
| Parameter | Example | Description |
|---|---|---|
"class": "Article" | Collection name; must start with an uppercase letter; immutable after creation. | |
"description": "News articles" | Human-readable documentation string; mutable. | |
"vectorizer": "text2vec-openai" | Module that auto-generates vectors at import/query time; immutable after creation; set "none" for BYOV. | |
"vectorIndexType": "hnsw" | Vector index algorithm: hnsw (default), flat, dynamic, or hfresh; immutable after creation. | |
"vectorIndexConfig": {"ef": 64, "efConstruction": 128} | Fine-tunes the chosen index (e.g., HNSW ef, efConstruction, maxConnections); partially mutable. | |
"vectorConfig": {"title": {"vectorizer": {...}, "vectorIndexConfig": {...}}} | Named vectors β defines multiple independent vector spaces per object, each with its own vectorizer and index. | |
"properties": [{"name": "title", "dataType": ["text"]}] | Array of property definitions; new properties can be added later but existing ones cannot be deleted. |