Tuesday, 3 March 2020

Cassandra: How the data is written?

In this post, I am going to explain how Cassandra writes data to commit log, memtable and SStable.

When data is written to Cassandra node, it stores the data in an in-memory table called ‘memtable’ and parallelly it writes the same data to the commit log on the disk.

Why to write data to commit log?
Since if you write data to only memtable, then if a node is down because of power failure, all the data will be lost. To make sure of data durability, Cassandra write data to commit log.

Flush the data from memtable?
To make the things faster, Cassandra writes the data from memtable to SSTable on the disk. Flushed sstable files are immutable and no changes can be done. SStables will be merged once it reaches some threshold to reduce read overhead

Why are we writing to commit log, instead of directly writing to SSTable?
SSTable stores the data in sorted order of the rows, whereas commit log stores the data in the order it is processed by Cassandra. So it is always efficient to append the data than placing it in sorting order.

Apart from this, commit log is optimized for writing, writing to commit log is faster than SSTable. Cassandra internally keeps track on what data is written to SStable and truncate commit log according to this.



Previous                                                    Next                                                    Home

No comments:

Post a Comment