Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra is used as the non-SQL based datastore for OAuth toolkit on CA API gateway.
Replication Factor (RF)
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster.
Two replication strategies are available:
- SimpleStrategy is used only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location). A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
- NetworkTopologyStrategy is used for a cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter. NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first node in another rack. NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.
Consistency Level (CL)
- Write consistency level determines the number of replicas on which the write must succeed before returning an acknowledgment to the client application.
- Read consistency level specifies how many replicas must respond to a read request before returning data to the client application.
Cassandra offers tunable data consistency across a database cluster. This means you can decide whether you want strong or eventual consistency for a particular transaction. You might want a particular request to complete if just one node responds, or you might want to wait until all nodes respond. Tunable data consistency is supported across single or multiple data centers, and you have a number of different consistency options from which to choose. Consistency is configurable on a per-query basis, meaning you can decide how strong or eventual consistency should be per SELECT, INSERT, UPDATE, and DELETE operation. For example, if you need a particular transaction to be available on all nodes throughout the world, you can specify that all nodes must respond before a transaction is marked complete. On the other hand, a less critical piece of data (e.g., a social media update) may only need to be propagated eventually, so in that case, the consistency requirement can be greatly relaxed.
Consistency levels in Cassandra can be configured to manage availability versus data accuracy. You can configure consistency on a cluster, data center, or individual I/O operation basis. Consistency among participating nodes can be set globally and also controlled on a per-operation basis . By default, Consistency Level LOCAL_ONE will be used for all queries. Available Read/Write CLs: ALL, EACH_QUORUM, QUORUM, LOCAL_QUORUM, ONE, TWO, THREE, LOCAL_ONE, ANY, SERIAL, LOCAL_SERIAL.
Cassandra Write Path
- Logging data in the commit log: When a write occurs, Cassandra appends the data to sequential, memory-mapped commit log on disk. This provides configurable durability, as the commit log can be used to rebuild MemTables if a crash occurs before the MemTable is flushed to disk.
- Writing data to the MemTable: MemTables are mutable, in-memory tables that are read/write. Each physical table on each replica node has an associated MemTable. The memtable is a write-back cache of data partitions that Cassandra looks up by key. The memtable stores writes until reaching a limit, and then is flushed.
- Flushing data from the MemTable: When MemTable contents exceed a configurable threshold, the memtable data, which includes indexes, is put in a queue with a configurable length, to be flushed to disk. If the data to be flushed exceeds the queue size, Cassandra blocks writes until the next flush succeeds.
- Storing data on disk in SSTables: Data in the commit log is purged after its corresponding data in the memtable is flushed to an SSTable. Sorted Strings Table is a file of key/value string pairs, sorted by keys. Memtables and SSTables are maintained per table. SSTables are immutable, not written to again after the memtable is flushed.
- Compaction: Periodic compaction is essential to a healthy Cassandra database because Cassandra does not insert/update in place. As inserts/updates occur, instead of overwriting the rows, Cassandra writes a new timestamped version of the inserted or updated data in another SSTable. Cassandra manages the accumulation of SSTables on disk using compaction.
Cassandra does not delete in place because the SSTable is immutable. Instead, Cassandra marks data to be deleted using a tombstone. Tombstone is a marker in a row that indicates a column was deleted. Tombstones exist for a configured time period set on the table. Data in a Cassandra column can have an optional expiration date called TTL (time to live). Cassandra marks TTL data with a tombstone after the requested amount of time has expired. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.During compaction, while marked columns are deleted, there is a temporary spike in disk space usage and disk I/O because the old and new SSTables co-exist.