A feature store is a centralized data platform that manages the storage, computation, and serving of machine learning features across training and inference pipelines. It solves the critical challenge of training-serving consistency by ensuring features computed during model training match those used in production predictions. Feature stores distinguish between online stores (low-latency serving for real-time inference) and offline stores (batch retrieval for training), while addressing point-in-time correctness to prevent data leakage. Understanding architectural patterns—literal, physical, or virtual—alongside platform choices like Feast, Tecton, and Hopsworks determines whether your feature store enables true operational ML at scale or becomes another data engineering burden.
What This Cheat Sheet Covers
This topic spans 13 focused tables and 104 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture Patterns
Feature stores emerged to solve the fundamental disconnect between how data scientists build models and how production systems serve predictions. Three distinct architectural patterns have evolved, each making different trade-offs between centralization, latency, and operational overhead.
| Pattern | Example | Description |
|---|---|---|
features_df = fs.compute_and_store( features=['user_30d_purchases']) | Centralized storage that computes features via data pipelines and persists all values; data scientists retrieve precomputed features directly from the store. | |
fs.register_transformation( sql="SELECT user_id, SUM(amount)...", source=warehouse) | Acts as a metadata layer mapping feature definitions to existing data sources (data warehouse, lake); features remain in original storage, store provides unified access API. | |
fs.define_feature_view( source=kafka_topic, transformation=streaming_logic) | Orchestrates feature computation at query time using external engines (Spark, Flink); no persistent feature storage, generates features on-demand from raw data. | |
feature_vector = online_store.get( entity_id="user_123", features=["clicks_1h", "cart_count"]) | Low-latency key-value store (Redis, DynamoDB) serving features for real-time inference; optimized for sub-10ms point lookups by entity key. |