Weaviate is an open-source, AI-native vector database built in Go that stores objects and vectors together, enabling hybrid search (vector + keyword) at scale. Positioned as an AI database rather than just a vector store, Weaviate integrates deeply with embedding models (OpenAI, Cohere, Hugging Face) and generative modules for RAG patterns, offering automatic vectorization and native multi-tenancy. Unlike pure vector stores, it maintains an inverted index alongside HNSW indexes as a core architectural component, supporting complex GraphQL queries with nested cross-references and aggregations. The key mental model: Weaviate treats collections as first-class citizens with schema-defined properties, where each object can have multiple named vectors for multi-modal search, and tenants are physically isolated via dedicated shards—making it particularly strong for SaaS deployments requiring data isolation, hybrid retrieval, and production-grade filtering at billion-object scale.
What This Cheat Sheet Covers
This topic spans 34 focused tables and 138 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Collection Schema Configuration
| Property | Example | Description |
|---|---|---|
client.collections.create( name="Article", vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai()) | Specifies the embedding module to auto-vectorize data; options include text2vec-openai, text2vec-cohere, text2vec-transformers, multi2vec-clip, or none for custom vectors. | |
vector_index_config=wvc.config.Configure.VectorIndex.hnsw( distance_metric=wvc.config.VectorDistances.COSINE) | Sets the index structure — hnsw (default, fast ANN), flat (brute-force), dynamic (starts flat, converts to HNSW), or hfresh (high-churn streaming data). | |
replication_config=wvc.config.Configure.replication( factor=3) | Number of data copies stored across cluster nodes for high availability; directly multiplies storage cost and improves fault tolerance. | |
sharding_config=wvc.config.Configure.sharding( virtual_per_physical=128, desired_count=2) | Controls horizontal partitioning; desired_count sets target shards, virtual_per_physical enables flexible rebalancing across nodes. |