HDFS stands
for Hadoop Distributed File System, is designed to store very large files in Hadoop
cluster. It is fault tolerant and can run on commodity hardware. Hadoop is
written in Java, so you can run Hadoop on any platform, which supports Java.
Distributed File System
Distributed
file system is a software, which manages the storage across a network of machines.
Hadoop run on Commodity Hardware
HDFS don’t
require very expensive hardware; it can run on cluster of commodity hardware
machines (Like your laptops, computers, low cost servers etc.,).
HDFS Coherence model
HDFS
applications need a write-once-read-many access model for files. A file once
created, written, and closed need not be changed.
HDFS Architecture
HDFS works
on master-slave architecture. There are two main components in HDFS.
a.
Namenode
b.
Datanode
Following
image is taken from Hadoop documentation.
Namenode
It is a
master server, maintains metadata about the data stored in HDFS. Namenode
records any changes to the file system. default block size is 64MB in Hadoop 1.x, 128MB in Hadoop 2.X.
Suppose, when client want to store 270MB of data(assume block size is configured to 64MB) into HDFS, following sequence
happens.
a.
270MB
is divided into 5 blocks (64MB, 64MB, 64MB, 64MB, 14MB). Client sends a request to the Namenode (by
specifying metadata about file) that it wants to store 5 blocks of data. Namenode
server respond to client by specifying data nodes to store the data.
Lets assume Namenode sends response like following.
Store Block1 in Datanode5
Store Block2 in Datanode135
Store Block3 in Datanode1124
Store Block4 in Datanode3
Store Block1 in Datanode87
b.
Now
client sends data blocks to specific data nodes. (I don’t talk about
replication here, will discuss in subsequent section)
In the same
way, if client want to read fie “A” from HDFS, it sends a request to Namenode
and enquire about datanodes, where file “A” is located. Once Namenode sends
metadata, client reads data blocks from respective data nodes.
Datanodes
Datanodes are
the actual place where the data is stored. The DataNodes are responsible for
serving read and write requests from the file system’s clients. The DataNodes
also perform block creation, deletion, and replication upon instruction from
the NameNode.
Replication Factor
Application
can specify how many replicas of a copy to be stored in HDFS. default replication factor in Hadoop is 3.
That means, each data block has 3 copies in the Hadoop cluster. The replicas of data block always kept on different machines.
How replication happens
Lets suppose
you want to save a file “A.txt” of size 270MB into HDFS. Following are the
sequence of steps going to happen.
1.
default block size is 64MB in Hadoop 1.x and 128MB in Hadoop 2.x. This block size is configurable. To demonstrate the example, assume the block size is 64MB. so client divides the file 'A.txt' into 5 blocks.
Block1 – 64mb
Block2 – 64mb
Block3 – 64mb
Block4 – 64mb
Block5 – 14mb
Client sends meta data of the file “A.txt” and
blocks details to Namenode.
2.
Namenode
receives client request, and check its meta data information about HDFS, and
respond to Client by specifying where to store block. For example response may
looks like following.
If replication factor is 3.
Store Block1 in Datanodes 153, 21, 39
Store Block2 in Datanodes 13, 543, 9
Store Block3 in Datanodes 1, 243, 567
Store Block4 in Datanodes 97, 321, 6
Store Block5 in Datanodes 64, 765, 98
3. Namenode sends block names and datanode location to store data into HDFS. Client then
copies the blocks to the closest Data node. The data node is then responsible
for copying the block to a second datanode, where finally the second will copy
to the third. After writing successfully to third data node, third datanode
send acknowledement to second datanode, second datanode send acknowledgement to
first datanode, first datanode send acknowledgement to client.
As you observe the floe, client will only copy data
to one of the data nodes, and the framework will take care of the replication
between datanodes.
Note
The blocks
of a file are replicated for fault tolerance. The block size and replication
factor are configurable per file. The replication factor can be specified at
file creation time and can be changed later.
What is Block report
All
datanodes periodically send a report contains list of blocks available/filled
in data node to Namenode.
What is heartbeat
Namenode
periodically receives heartbeat of datanodes in a cluster. Receipt of a
Heartbeat implies that the DataNode is functioning properly.
How HDFS select a data node to satisfy client read
request
HDFS tries to
satisfy a read request from a datanode that is closest to the client.
How file deletes and restore happens?
Hadoop
documentation says…
When a user
or an application deletes a file, it is not immediately removed from HDFS.
Instead, HDFS first renames it to a file in the /trash directory. The file can
be restored quickly as long as it remains in /trash. A file remains in /trash
for a configurable amount of time. After the expiry of its life in /trash, the
NameNode deletes the file from the HDFS namespace. The deletion of a file
causes the blocks associated with the file to be freed. Note that there could
be an appreciable time delay between the time a file is deleted by a user and
the time of the corresponding increase in free space in HDFS.
A user can
Undelete a file after deleting it as long as it remains in the /trash
directory. If a user wants to undelete a file that he/she has deleted, he/she
can navigate the /trash directory and retrieve the file. The /trash directory
contains only the latest copy of the file that was deleted. The /trash
directory is just like any other directory with one special feature: HDFS
applies specified policies to automatically delete files from this directory.
Current default trash interval is set to 0 (Deletes file without storing in trash).
This value is configurable parameter stored as fs.trash.interval stored in
core-site.xml.
How file system meta data persists
Namenode maintains
two components to maintain meta data persistence.
1.EditLog: It is a transaction log which
persistently record every change that occurs to file system metadata.
2.FsImage: It is a snapshot of the file system metadata (like which blocks are stored at what location etc.,) at a given point of time. The entire file system
namespace, including the mapping of blocks to files and file system properties,
is stored in a file called the FsImage. You can assume FsImage as a periodical
snapshot of EditLog. If a name node crashes, it can restore its state by first
loading the fsimage then replaying all the operations (also called edits or
transactions) in the edit log to catch up to the most recent state of the
namesystem.
Latest fsImage will be created by merging the previous fsImage and changes in the edit log file.
Is
NameNode perform the merging of FsImage and Edit log file?
No. Secondary
name node is responsible to merge FsImage and Edit log files. Name node writes
FsImage and edit log files to a shared location, Secondary name node read this
FsImage, log files from the shared location, merge them and write the new
FsImage to the shared location.
Once the
merging process completed successfully,
a.
Old
FsImage is replaced with new FsImage and
b.
edit
log file is cleared.
This
merging process will be done for every 30 seconds by default and it is
configurable.
What
will happen when a name node is down?
In Hadoop
1.x, name node is a single point of failure, when a name node fails, entire
system will not work and manual intervention is required to make the system up
and running again.
Hadoop 2.x
addresses this issue with the help of secondary name node. When primary node
went down, secondary name node loads the latest FsImage and start acting as primary
name node. But the merging process will not be done until Hadoop admin configured new secondary node.
What
will happen when a data node is down?
When a
data node is down,
a.
Name
node replicate the blocks that are persisted in the failed node to other active
data nodes in the cluster.
b.
Namenode
remove the all the entries of failed data node from the metadata.
Referred Articles
No comments:
Post a Comment