Apache Cassandra is a distributed NoSQL database built for handling massive volumes of data across commodity servers while providing high availability with no single point of failure. Originally developed at Facebook and open-sourced in 2008, it uses a peer-to-peer architecture with eventual consistency, allowing linear scalability and fault tolerance across multiple datacenters. Cassandra's write-optimized storage engine leverages an LSM tree structure (Memtable → SSTable via Commit Log) that enables extremely fast writes, while its tunable consistency lets you balance between latency and data correctness per query. The key mental model: Cassandra trades strong consistency guarantees for availability and partition tolerance, so data modeling is query-first — denormalize aggressively, design tables around access patterns, and embrace the reality that joins and aggregations across partitions are expensive or impossible.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 126 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Keyspace Management
| Command | Example | Description |
|---|---|---|
CREATE KEYSPACE cycling WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 2}; | Creates a top-level namespace defining replication strategy and replication factor per datacenter; NetworkTopologyStrategy is production standard. | |
CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3}; | Single-datacenter replication strategy that places replicas on consecutive nodes clockwise; not recommended for production • Use only for development or single-DC testing. | |
ALTER KEYSPACE cycling WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'dc1': 4}; | Modifies keyspace properties such as replication factor; changes take effect immediately but require nodetool repair to redistribute data. |