Sequential
I/O is a key factor in the high performance and scalability of Kafka. Let’s try
to understand what is Sequential IO first, followed by the benefits that Kafka
is getting from it.
What is sequential IO operation?
Sequential IO is a process of reading or writing data to or from a disk/storage medium in a linear (step by step) manner. This is different from Random access IO, where you can access any part of the data without reading or writing the preceding parts.
Advantages of Sequential IO
a. Sequential IO is simple to implement.
b. Sequential IO is quite faster than Random access IO, as it minimizes the seek time.
Example of sequential IO in Java
FileInputStream, FileOutputStream, FileWriter, BufferedWriter, FileReader, BufferedReader classess perform Sequential IO operations.
SequentialIO.java
package com.sample.app;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class SequentialIO {
public static void main(String[] args) {
// Define the file path
String filePath = "sample.txt";
// Data to write to the file
String[] data = { "Hey Guys!!!!", "This is a Java sequential I/O example.",
"We are writing data to a text file line by line." };
// Open the file in append mode
try (FileWriter fileWriter = new FileWriter(filePath, true);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter)) {
for (String line : data) {
bufferedWriter.write(line);
bufferedWriter.newLine();
}
System.out.println("Data has been written to : " + filePath);
} catch (IOException e) {
e.printStackTrace();
}
// Read the data
System.out.println("Reading data from file : " + filePath + "\n");
try (FileReader fileReader = new FileReader(filePath);
BufferedReader bufferedReader = new BufferedReader(fileReader);) {
String line;
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Output
Data has been written to sample.txt Reading data from file : sample.txt Hey Guys!!!! This is a Java sequential I/O example. We are writing data to a text file line by line.
What is seek time?
It is the time taken by read/write head to move to the desired location on a physical storage medium, such as a magnetic tape or hard disk.
How Kafka is benefitted from Sequential IO?
Kafka internally use sequential I/O approach for writing and reading messages to and from topics.
In Kafka, every topic has one or more partitions associated with it. Each partition maintains an append-only log, where the new messages are appended to the existing log file.
High throughput
Sequential write operation is more efficient, as it is just appending data to the end of the log without the need to seek to specific/random positions in the file.
Same is the case with reads also, consumer read the messages sequentially. Consumer starts reading the messages from a specific offset and continue to read messages sequentially.
Data Integrity
Since in append-only log file, once the message is written, it can’t be modified. There is no chance of data integrity issues here.
Scalability
Each partition log file is further divided into segments (files of fixed size). When the segment is full, a new file is created for the partition.
Simple
Sequential IO is simple to implement and manage
Operating system and storage systems do some enhancements to improve the performance while reading data in Sequential IO
Sequential IO is benefitted from following optimization techniques by Operating system, storage systems even though the data is not stored in strictly sequential manner.
Read-ahead and cache mechanism: System can anticipate the next data block to read, preloads it into cache or memory. But this is not possible in Random access IO. This prefetching reduce the latency in reading data.
In summary, Kafka take the performance benefits of sequential IO while reading and writing messages.
Note
If you open hundreds of files and performing I/O operations, then your hard drive will likely rotate faster.
System Design Questions
No comments:
Post a Comment