What is Apache Cassandra?¶
Apache Cassandra is a distributed, wide-column NoSQL database designed for high availability and linear scalability. It handles massive amounts of data across commodity servers with no single point of failure, making it the database of choice for applications that cannot afford downtime.
This guide goes beyond surface-level descriptions to explain why Cassandra works the way it does, when it is the right choice, and what tradeoffs are involved.
The Problem Cassandra Solves¶
Before understanding Cassandra, understand the problem it was built to solve.
The Scalability Wall¶
Traditional relational databases scale vertically - when more capacity is needed, a bigger server is purchased:
Year 1: 1 server, 16GB RAM, 500GB disk → Works great
Year 2: 1 server, 64GB RAM, 2TB disk → Still okay
Year 3: 1 server, 256GB RAM, 10TB disk → Expensive, but works
Year 4: ??? Maximum server size reached → NOW WHAT?
At some point, a wall is hit. The biggest server money can buy is not enough. Data must be spread across multiple machines, but traditional databases were not designed for this.
The Availability Problem¶
Even with an infinitely large server, it would still be a single point of failure. When that server goes down - and it will - the entire application is offline.
Traditional solutions (primary-replica replication, clustering) help but introduce complexity and failure modes:
Primary-Replica Problems:
- Primary fails → manual failover required (downtime)
- Replication lag → stale reads from replicas
- Split brain → both think they're primary (data corruption)
- Write scalability → still limited to single primary
The Geographic Distribution Problem¶
Global applications need data close to users. A user in Tokyo should not wait 200ms for every database query to reach a server in Virginia. But traditional databases were not built for multi-region deployment.
How Cassandra Solves These Problems¶
Cassandra was designed from the ground up to address these challenges:
1. Linear Horizontal Scalability¶
Need more capacity? Add more nodes. The relationship is linear:
3 nodes → handles X requests/second
6 nodes → handles 2X requests/second
12 nodes → handles 4X requests/second
No architectural limit. Companies run 1000+ node clusters.
This works because: - Data is automatically distributed across nodes using consistent hashing - Each node is responsible for a portion of the data - Adding nodes automatically rebalances data
2. No Single Point of Failure¶
Every node in a Cassandra cluster is identical. There is no primary, no leader, no special node:
Data is replicated to multiple nodes. When a node fails: - Other nodes have copies of its data - Requests are automatically routed to surviving nodes - When the node recovers, it automatically catches up
3. Multi-Datacenter by Design¶
Cassandra was built for geographic distribution:
- Each region has local replicas for low latency
- Cross-datacenter consistency is tunable per query
- Each region can operate independently during network partitions
Origin Story¶
Cassandra's architecture makes more sense when understanding why it was built.
Facebook's Inbox Search Problem (2007)¶
Facebook needed to store and search billions of messages. Requirements: - Handle 100+ million users - Millisecond search latency - Never go offline (users expect 24/7 availability) - Scale without rebuilding
Existing solutions didn't work: - MySQL could not scale writes - Oracle was too expensive - Existing NoSQL options (memcached) could not persist reliably
The Dynamo + BigTable Combination¶
Facebook engineers Avinash Lakshman and Prashant Malik combined:
Amazon Dynamo (2007): - Consistent hashing for data distribution - Gossip protocol for cluster membership - Tunable consistency - Sloppy quorum for availability
Google BigTable (2006): - Wide-column data model - Log-structured storage (LSM trees) - Column families
The result was Cassandra - open-sourced in 2008, became Apache top-level project in 2010.
Historical Context¶
Understanding Cassandra's heritage explains its behavior:
| Inherited From | Feature | Implication |
|---|---|---|
| Dynamo | Eventual consistency | Must design for it |
| Dynamo | No single point of failure | True peer-to-peer |
| Dynamo | Tunable consistency | Application chooses |
| BigTable | Column-oriented storage | Efficient for wide rows |
| BigTable | LSM trees | Fast writes, slower reads |
The CAP Theorem and Cassandra¶
Understanding Cassandra requires understanding the CAP theorem.
CAP Theorem Explained¶
The CAP theorem states that a distributed system can only guarantee two of three properties:
- Consistency: Every read receives the most recent write
- Availability: Every request receives a response
- Partition tolerance: System continues despite network failures
Cassandra's Choice: AP (with Tunable C)¶
In the real world, network partitions will happen. P cannot be given up. So the choice is between:
- CP systems: When partition occurs, refuse requests (maintain consistency, sacrifice availability)
- AP systems: When partition occurs, continue serving requests (maintain availability, sacrifice consistency)
Cassandra chose AP - it will always respond, even if the response might be slightly stale.
BUT: Cassandra allows tuning consistency per operation. Strong consistency is available when needed:
-- AP behavior: fast, available, might be stale
CONSISTENCY ONE;
SELECT * FROM users WHERE user_id = ?;
-- CP behavior: slower, might fail if nodes down, always consistent
CONSISTENCY ALL;
SELECT * FROM users WHERE user_id = ?;
-- The sweet spot for most applications
CONSISTENCY LOCAL_QUORUM;
SELECT * FROM users WHERE user_id = ?;
How Cassandra Actually Works¶
The Write Path¶
When data is written to Cassandra:
Why writes are fast:
- Commit log is append-only - Sequential disk writes are 100x faster than random
- Memtable is in memory - Microsecond latency
- No read-before-write - Unlike B-trees, no need to read existing data
- Async replication option - Can acknowledge before all replicas confirm
The Read Path¶
When data is read:
Why reads can be slower:
- Data might be spread across multiple SSTables
- Need to merge results from memtable + multiple SSTables
- Tombstones (deletes) must be processed
- May need to query multiple replicas
This is the fundamental tradeoff: Cassandra optimizes writes at the expense of reads.
Compaction: The Background Worker¶
Over time, writes create many SSTables. Compaction merges them:
Compaction strategies (each optimized for different workloads):
| Strategy | Best For | Characteristics |
|---|---|---|
| STCS (Size-Tiered) | Write-heavy | Groups similar-sized SSTables |
| LCS (Leveled) | Read-heavy | Guarantees bounded read amplification |
| TWCS (Time-Window) | Time-series | Efficient for TTL data |
| UCS (Unified) | General (5.0+) | Adaptive, combines benefits |
When to Use Cassandra¶
Cassandra Excels At¶
1. High Throughput at Scale
Cassandra handles both high write AND read throughput when data is modeled correctly:
Write-heavy examples:
- IoT sensor data (millions of devices writing continuously)
- Clickstream/event tracking
- Log aggregation
- Time-series metrics
Read-heavy examples:
- User profile lookups (billions of reads/day)
- Product catalogs
- Session stores
- Recommendation serving
Mixed workloads:
- Messaging systems (write messages, read conversations)
- Social feeds (write posts, read timelines)
With proper data modeling and caching (key cache, row cache), Cassandra serves reads with single-digit millisecond latency at massive scale.
2. High Availability Requirements
If downtime costs money or reputation:
Use Case Examples:
- E-commerce (cannot miss orders)
- Financial transactions
- Healthcare systems
- Gaming (live services)
- Any 24/7 service
3. Geographic Distribution
If users are global and latency matters:
Use Case Examples:
- Social media platforms
- Content delivery
- Global SaaS applications
- Multi-region applications
4. Linear Scalability Needs
If data growth is unpredictable or potentially massive:
Use Case Examples:
- Startups expecting growth
- Platforms with viral potential
- Data-intensive applications
Production Scale Examples¶
| Company | Use Case | Scale |
|---|---|---|
| Apple | iCloud, Apple Music, Siri | 150,000+ nodes, 10+ PB |
| Netflix | Streaming, recommendations | 500+ nodes per cluster |
| Direct messages, feeds | Millions of writes/second | |
| Uber | Driver/rider matching, maps | Billions of requests/day |
| Discord | Messages, presence | Billions of messages, trillions of reads |
| Spotify | User data, playlists | Thousands of nodes |
When NOT to Use Cassandra¶
Cassandra is not a general-purpose database. It has specific tradeoffs that make it wrong for certain use cases—though several traditional limitations are being addressed in recent releases.
1. Complex Joins and Aggregations
-- CASSANDRA CANNOT DO THIS:
SELECT u.name, COUNT(o.id), SUM(o.total)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.name
HAVING SUM(o.total) > 1000;
-- Data must be denormalized or use Spark for analytics
-- Use: PostgreSQL, ClickHouse, Snowflake for analytical queries
This is a fundamental architectural choice—Cassandra optimizes for distributed writes, not relational queries.
2. Small Datasets Without Availability Requirements
If data fits on one server AND availability is not critical, simpler options exist:
Consider simpler alternatives when:
- Downtime during maintenance is acceptable
- Single-region deployment is sufficient
- Team lacks distributed systems expertise
If availability IS critical: Cassandra is appropriate regardless of dataset size
Many organizations run Cassandra for small datasets specifically because they cannot tolerate downtime.
3. Rapidly Evolving Query Patterns
If query patterns change frequently and unpredictably:
Cassandra data modeling principle:
- Design tables around queries (query-first design)
- Adding new query patterns may require new tables
- Schema changes at scale require careful planning
For exploratory/evolving queries: PostgreSQL, Elasticsearch
Limitations That Are Changing¶
The following traditional limitations are being addressed in Cassandra 5.x:
Ad-hoc Queries (Improved in 5.0)
Storage Attached Indexes (SAI) in Cassandra 5.0 significantly improve secondary index performance:
-- BEFORE 5.0: Secondary indexes were expensive and limited
-- WITH SAI (5.0+): Much more practical for non-partition-key queries
CREATE INDEX ON users (email) USING 'sai';
SELECT * FROM users WHERE email = '[email protected]';
-- SAI provides ~40% better throughput and ~230% better latency
-- than legacy secondary indexes
SAI does not make Cassandra equivalent to a relational database for ad-hoc queries, but it substantially expands what is practical.
Multi-Partition ACID Transactions (Coming in 5.1+)
The Accord protocol (CEP-15) brings general-purpose ACID transactions to Cassandra:
-- COMING WITH ACCORD (5.1+):
BEGIN TRANSACTION
UPDATE accounts SET balance = balance - 100 WHERE id = 'sender';
UPDATE accounts SET balance = balance + 100 WHERE id = 'receiver';
COMMIT;
Accord is a leaderless consensus protocol that enables:
- Multi-partition transactions across any set of keys
- Strict serializable isolation
- Single wide-area round-trip for cross-region transactions
- No single point of failure (unlike leader-based approaches)
Current Status
Accord is under active development. For applications requiring multi-partition transactions today, consider PostgreSQL, CockroachDB, or implement application-level saga patterns.
Strong Consistency Trade-offs
Cassandra has always supported strong consistency via QUORUM and LWT, but at the cost of availability during partitions. With Accord:
- Strict serializability without sacrificing Cassandra's distributed architecture
- Better performance than Paxos-based LWT for complex operations
- Maintains Cassandra's peer-to-peer, no-single-point-of-failure design
Cassandra vs Other Databases¶
Cassandra vs MongoDB¶
| Aspect | Cassandra | MongoDB |
|---|---|---|
| Data Model | Wide-column (table-like) | Document (JSON) |
| Schema | Schema-per-table | Schema-per-collection (flexible) |
| Scalability | Peer-to-peer, linear | Sharded, with config servers |
| Consistency | Tunable (default: eventual) | Strong for single doc |
| Joins | None | $lookup (limited) |
| Secondary Indexes | Expensive | Native support |
| Write Performance | Excellent | Good |
| Read Performance | Good (with right model) | Excellent |
| Operations | Complex | Easier |
| Best For | Scale, availability, writes | Flexibility, documents |
Choose Cassandra when: Write-heavy, need linear scale, multi-DC required Choose MongoDB when: Flexible schema, document-oriented, smaller scale
Cassandra vs PostgreSQL¶
| Aspect | Cassandra | PostgreSQL |
|---|---|---|
| Type | Distributed NoSQL | Relational |
| Scaling | Horizontal (add nodes) | Vertical (bigger server) |
| Consistency | Tunable | Strong ACID |
| Transactions | Single-partition LWT | Full ACID |
| Schema | Must design for queries | Normalized design |
| Joins | Not supported | Full support |
| Indexes | Limited | Rich (B-tree, GIN, GiST) |
| Query Flexibility | Limited | Full SQL |
| Availability | Native HA | Requires setup |
| Best For | Scale, availability | Complex queries, transactions |
Choose Cassandra when: Scaling beyond single server, HA is critical Choose PostgreSQL when: Complex queries, transactions, < 100GB data
Cassandra vs DynamoDB¶
| Aspect | Cassandra | DynamoDB |
|---|---|---|
| Deployment | Self-managed or DBaaS | AWS-managed |
| Vendor | Open source | AWS only |
| Pricing | Infrastructure | Per-request or capacity |
| Multi-Region | Built-in | Global Tables ($$$) |
| Query Language | CQL | PartiQL / API |
| Flexibility | Full control | Limited configuration |
| Expertise Required | High | Lower |
| Cost at Scale | Lower | Higher |
Choose Cassandra when: Vendor independence, multi-cloud, cost control at scale Choose DynamoDB when: AWS-only, minimal ops, getting started quickly
Cassandra vs ScyllaDB¶
| Aspect | Cassandra | ScyllaDB |
|---|---|---|
| Language | Java | C++ |
| Performance | Good (gap narrowing with 5.0+) | Faster (historically 3-10x, now closer) |
| Resource Usage | Improved with modern GCs | More efficient |
| Compatibility | Original | CQL compatible |
| Community | Large, mature | Growing |
| Features | All features (Accord, SAI, Vector) | Most features |
| Support | Apache, vendors | ScyllaDB Inc |
| JDK Options | JDK 17+ with Shenandoah/ZGC | N/A |
The performance gap has narrowed significantly:
- Cassandra 5.0 includes substantial performance improvements
- JDK 17+ with Shenandoah or ZGC dramatically reduces GC pause times
- Modern Cassandra deployments see much smaller performance differences than historical benchmarks suggest
Choose Cassandra when: Need latest features (Accord, SAI), larger ecosystem, JVM expertise Choose ScyllaDB when: Maximum single-node throughput, want to avoid JVM tuning
Essential Terminology¶
Cluster Topology¶
| Term | Definition | Analogy |
|---|---|---|
| Cluster | All nodes working together | The entire database |
| Node | Single Cassandra instance | One server |
| Datacenter (DC) | Logical grouping of nodes | Usually maps to physical DC or region |
| Rack | Subdivision of DC | For failure isolation |
| Token | Hash value identifying data ownership | Node's "address" on the ring |
| Token Range | Range of tokens a node owns | Node's "slice" of data |
| Seed Node | Bootstrap node for gossip | How new nodes find the cluster |
Data Model¶
| Term | Definition | SQL Equivalent |
|---|---|---|
| Keyspace | Container for tables | Database |
| Table | Collection of partitions | Table |
| Partition | Unit of distribution | No equivalent |
| Partition Key | Determines data location | Part of PRIMARY KEY |
| Clustering Column | Sorts within partition | ORDER BY column |
| Row | Single record | Row |
| Cell | Single column value | Cell |
| TTL | Auto-expire data | No equivalent |
| Tombstone | Marker for deleted data | No equivalent |
Operations¶
| Term | Definition |
|---|---|
| Replication Factor (RF) | Copies of data stored |
| Consistency Level (CL) | Replicas required to respond |
| Quorum | Majority of replicas ((RF/2) + 1) |
| Compaction | Merging SSTables |
| Repair | Synchronizing replicas |
| Gossip | Node-to-node state sharing |
| Hinted Handoff | Store-and-forward for down nodes |
| Read Repair | Fix inconsistencies on read |
| Anti-entropy Repair | Full replica synchronization |
Getting Started¶
After understanding what Cassandra is and when to use it:
Learning Path¶
1. Install Cassandra → Get it running
└── installation/index.md
2. First Cluster → Multi-node setup
└── first-cluster.md
3. CQL Basics → Write queries
└── quickstart-cql.md
4. Data Modeling → Design schemas
└── ../data-modeling/index.md
5. Architecture Deep Dive → Understand internals
└── ../architecture/index.md
Try It Now (5 minutes)¶
# Start Cassandra with Docker
docker run --name cassandra -d -p 9042:9042 cassandra:5.0
# Wait for startup
sleep 60
# Connect
docker exec -it cassandra cqlsh
# Create and query data
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE test;
CREATE TABLE users (id uuid PRIMARY KEY, name text);
INSERT INTO users (id, name) VALUES (uuid(), 'Test User');
SELECT * FROM users;
Additional Resources¶
Papers and Technical Deep Dives¶
- Dynamo: Amazon's Highly Available Key-value Store - The foundation
- Bigtable: A Distributed Storage System - Column-family model
- Cassandra - A Decentralized Structured Storage System - Original Cassandra paper
Learning Resources¶
- Apache Cassandra Documentation
- DataStax Academy - Free certification courses
- The Last Pickle Blog - Deep technical articles
- AxonOps Blog - Operational guides
Community¶
Next: Install Cassandra - Get Cassandra running