Sunday, 8 October 2023

How Sequential I/O Makes Kafka Fast and Scalable?

Sequential I/O is a key factor in the high performance and scalability of Kafka. Let’s try to understand what is Sequential IO first, followed by the benefits that Kafka is getting from it.

What is sequential IO operation?

Sequential IO is a process of reading or writing data to or from a disk/storage medium in a linear (step by step) manner. This is different from Random access IO, where you can access any part of the data without reading or writing the preceding parts.

 

Advantages of Sequential IO

a.   Sequential IO is simple to implement.

b.   Sequential IO is quite faster than Random access IO, as it minimizes the seek time.

 

Example of sequential IO in Java

FileInputStream, FileOutputStream, FileWriter, BufferedWriter, FileReader, BufferedReader classess perform Sequential IO operations.

 


 

SequentialIO.java

package com.sample.app;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class SequentialIO {

	public static void main(String[] args) {
		// Define the file path
		String filePath = "sample.txt";

		// Data to write to the file
		String[] data = { "Hey Guys!!!!", "This is a Java sequential I/O example.",
				"We are writing data to a text file line by line." };

		// Open the file in append mode
		try (FileWriter fileWriter = new FileWriter(filePath, true);
				BufferedWriter bufferedWriter = new BufferedWriter(fileWriter)) {

			for (String line : data) {
				bufferedWriter.write(line);
				bufferedWriter.newLine();
			}

			System.out.println("Data has been written to : " + filePath);
		} catch (IOException e) {
			e.printStackTrace();
		}

		// Read the data
		System.out.println("Reading data from file : " + filePath + "\n");
		try (FileReader fileReader = new FileReader(filePath);
				BufferedReader bufferedReader = new BufferedReader(fileReader);) {

			String line;
			while ((line = bufferedReader.readLine()) != null) {
				System.out.println(line);
			}

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

Output

Data has been written to sample.txt
Reading data from file : sample.txt

Hey Guys!!!!
This is a Java sequential I/O example.
We are writing data to a text file line by line.

What is seek time?

It is the time taken by read/write head to move to the desired location on a physical storage medium, such as a magnetic tape or hard disk.

 

How Kafka is benefitted from Sequential IO?

Kafka internally use sequential I/O approach for writing and reading messages to and from topics.

 

In Kafka, every topic has one or more partitions associated with it. Each partition maintains an append-only log, where the new messages are appended to the existing log file.

 

High throughput

Sequential write operation is more efficient, as it is just appending data to the end of the log without the need to seek to specific/random positions in the file.

 

Same is the case with reads also, consumer read the messages sequentially. Consumer starts reading the messages from a specific offset and continue to read messages sequentially.

 

Data Integrity

Since in append-only log file, once the message is written, it can’t be modified. There is no chance of data integrity issues here.

 

Scalability

Each partition log file is further divided into segments (files of fixed size). When the segment is full, a new file is created for the partition.

 

Simple

Sequential IO is simple to implement and manage

 

Operating system and storage systems do some enhancements to improve the performance while reading data in Sequential IO

Sequential IO is benefitted from following optimization techniques by Operating system, storage systems even though the data is not stored in strictly sequential manner.

 

Read-ahead and cache mechanism: System can anticipate the next data block to read, preloads it into cache or memory. But this is not possible in Random access IO. This prefetching reduce the latency in reading data.

 

In summary, Kafka take the performance benefits of sequential IO while reading and writing messages.

 

Note

If you open hundreds of files and performing I/O operations, then your hard drive will likely rotate faster.



                                                             System Design Questions

No comments:

Post a Comment