Considerations when setting up OTK with a Multi-Node Cassandra Cluster

Document ID : KB000009806
Last Modified Date : 14/02/2018
Show Technical Document Details
Introduction:

This article's aim is to shed some light on performance ramifications of choosing replication factor, consistency level and compacting scheme, when using a Cassandra multi-node cluster for storage and retrieval of OTK information.  If the above mentioned factors are not setup properly, it could have a significant impact on the performance of R/W operations requested by OTK.

 

Background:

Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra is used as the non-SQL based datastore for OAuth toolkit on CA API gateway.

 

Replication Factor (RF)

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster.

Two replication strategies are available: 

  • SimpleStrategy is used only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location). A partitioner determines how data is distributed across the nodes in the cluster (including replicas). 

 

  • NetworkTopologyStrategy is used for a cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter. NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first node in another rack. NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.

 

Consistency Level (CL)

  • Write consistency level determines the number of replicas on which the write must succeed before returning an acknowledgment to the client application. 
  • Read consistency level specifies how many replicas must respond to a read request before returning data to the client application.

 

Cassandra offers tunable data consistency across a database cluster. This means you can decide whether you want strong or eventual consistency for a particular transaction. You might want a particular request to complete if just one node responds, or you might want to wait until all nodes respond. Tunable data consistency is supported across single or multiple data centers, and you have a number of different consistency options from which to choose. Consistency is configurable on a per-query basis, meaning you can decide how strong or eventual consistency should be per SELECT, INSERT, UPDATE, and DELETE operation. For example, if you need a particular transaction to be available on all nodes throughout the world, you can specify that all nodes must respond before a transaction is marked complete. On the other hand, a less critical piece of data (e.g., a social media update) may only need to be propagated eventually, so in that case, the consistency requirement can be greatly relaxed.

 

Consistency levels in Cassandra can be configured to manage availability versus data accuracy. You can configure consistency on a cluster, data center, or individual I/O operation basis. Consistency among participating nodes can be set globally and also controlled on a per-operation basis . By default, Consistency Level LOCAL_ONE will be used for all queries. Available Read/Write CLs: ALL, EACH_QUORUM, QUORUM, LOCAL_QUORUM, ONE, TWO, THREE, LOCAL_ONE, ANY, SERIAL, LOCAL_SERIAL. 

 

Cassandra Write Path

  • Logging data in the commit log: When a write occurs, Cassandra appends the data to sequential, memory-mapped commit log on disk. This provides configurable durability, as the commit log can be used to rebuild MemTables if a crash occurs before the MemTable is flushed to disk.
  • Writing data to the MemTable: MemTables are mutable, in-memory tables that are read/write. Each physical table on each replica node has an associated MemTable. The memtable is a write-back cache of data partitions that Cassandra looks up by key. The memtable stores writes until reaching a limit, and then is flushed.
  • Flushing data from the MemTable: When MemTable contents exceed a configurable threshold, the memtable data, which includes indexes, is put in a queue with a configurable length, to be flushed to disk. If the data to be flushed exceeds the queue size, Cassandra blocks writes until the next flush succeeds.
  • Storing data on disk in SSTables: Data in the commit log is purged after its corresponding data in the memtable is flushed to an SSTable. Sorted Strings Table is a file of key/value string pairs, sorted by keys. Memtables and SSTables are maintained per table. SSTables are immutable, not written to again after the memtable is flushed.
  • Compaction: Periodic compaction is essential to a healthy Cassandra database because Cassandra does not insert/update in place. As inserts/updates occur, instead of overwriting the rows, Cassandra writes a new timestamped version of the inserted or updated data in another SSTable. Cassandra manages the accumulation of SSTables on disk using compaction.

Cassandra does not delete in place because the SSTable is immutable. Instead, Cassandra marks data to be deleted using a tombstone. Tombstone is a marker in a row that indicates a column was deleted. Tombstones exist for a configured time period set on the table. Data in a Cassandra column can have an optional expiration date called TTL (time to live). Cassandra marks TTL data with a tombstone after the requested amount of time has expired. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.During compaction, while marked columns are deleted, there is a temporary spike in disk space usage and disk I/O because the old and new SSTables co-exist. 

 

Environment:
Cassandra is supported on Mac OS X, and a variety of Linux distributions. Oracle Java VM 1.8 is highly recommended for Cassandra. I have used CentOS Linux release 7.2.1511 (Core), Apache Cassandra 2.2.8 and Oracle JDK 1.8.0_1-1 in my case study. Also I have used CA API MAnagement Gateway 9.1.00 with OTK 3.5.
Instructions:

A modular assertion was implemented to take advantage of CL feature is Cassandra. This assertion supports two features:

  • A new cluster-wide property called cassandra.consistencyLevel: By default the value is ONE. This affects every Cassandra query across all connections. Choosing ALL for example would have a negative impact on performance on clusters with nodes that have high latency.
  • Override on a per-query basis:the Cassandra query properties window is updated to allow for the ability to select an appropriate CL.

 

Updating the replication factor:

  • Update a keyspace in the cluster and change its replication strategy options: 

ALTER KEYSPACE "Hector" WITH REPLICATION =  { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

  • On each affected node, repair the node:

$$CASSANDRA_HOME/bin/nodetool repair

  • Wait until repair completes on a node, then move to the next node.

 

Compaction options are configured at the table level via CQLSH. This allows each table to be optimised based on how it will be used. If a compaction strategy is not specified, SizeTieredCompactionStrategy will be used.

  ALTER TABLE mytable

  WITH compaction =

  { 'class' : 'SizeTieredCompactionStrategy' }

 This is the default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty min_threshold. A minor compaction does not involve all the tables in a keyspace. Additional parameters allow STCS to be tuned to increase or decrease the number of compactions it performs and how tombstones are handled. You can manually start compaction using the nodetool compact command.

 

Techniques described for optimization tasks for OTK in DOCOPS are only applicable to SQL based datastores. noSQL policies in use do not do anything in terms of deletion, as we use TTL, instead of explicit deletes. It is up to customer to leave it to Cassandra to clean up tombstones based on its default compaction behavior, or use an explicit script to aggressively do the cleanup at intervals.  The volume of tombstones  and frequency of compaction affects the overall performance of the database. 

 

 

 

 

Additional Information: