Crack the Interviews: File distributions In HDFS

File distributions In HDFS

Overview on HDFS :-
HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients.
In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files.
Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes.
The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

First an HDFS client (system gives the command to the HDFS) and gives command to write /read data/files.

In the HDFS architecture there will be only one namenode and that co-ordinates everything around here.
The datanodes all the data .There are many datanodes sometimes even thousands.

Writing the data into HDFS

Suppose the client wants to write 200mb data into the HDFS then the client do two things.

Divides the data in 128 MB.
Copy each block in three places.

A client should know two things

Blocksize:-Larger files is divided in blocks usually 64 MB/128 MB.
Replication Factor:-Each block is stored in multiple locations.

Then the client the command to the namenode to write 128 MB block with the replication of 3.
Then the namenode finds 3 datanodes for this client depending on the replication factor.
Then the namenode address 3 datanodes and stored them in increasing distance from the namenode.
Then the client sends the data to the first datanode only and datanode stores the data in its hard drive.
While the data node receiving data it forwards the same data to the next datanode.
Then this process goes on, till 3 datanodes does not get the data. The namenode sends the data to the one data node only.
After all blocks are written client closes the file. Now the namenode store all the meta information in persistence storage/hard disk.
The client divides the file into blocks , for the address of the datanodes.
All the datanode stored data via Replication pipeline.
While the first datanode receiving the data from the client, at the same time it write the data to another datanode and so on. Depending upon replication factor. This is called the “Replication Pipeline”.
Client:-Divides the file into blocks.
Namenode:-Provides address of Datanodes.
Datanode:-Stores data via Replication pipeline.

Reading files from HDFS:

For reading file from HDFS, client first gives information(filename...etc) about the file to the namenode.
Then the namenode replay,

List of all blocks for this file.
List of datanodes for each block.

Then the clients comes to know – How many blocks to download and the datanodes where each block is stored.

Fault Tolerance:

Types of fault and their detection

Node failure.
Communication Failure.
Data Corruption.

Node failure:

If namenode is dead ,the entire cluster is dead. Namenode is the single point of failure.

Detecting Datanode Failure:-

Every datanodes sends heartbeat message every 3 seconds.
If the namenode does not get the message in 10 minutes, the datanode is dead for the namenode.
It may be a network failure but the namenode treats it as the same as network

failure.
When ever data is sent an acknowledgment is replied (after several retrieves) the sender assumes that the host is dead, or the network has failed.

Corrupted Data:

Every time checksum is sent along with transmitted data.
Whenever the datanode sores the data it's also stores the checksum.

Detecting corrupted Hard drive:

Periodically all datanode send BlockReports to the namenode.
BlockReports:- list of all blocks the datanode have. Before sending the BlockReports datanode checks info for blocks that are corrupted.
Before sending report datanode checks if checksums are OK. Datanode do not send info for blocks that are corrupted.
All datanodes send heartbeats every 3 seconds to say we are alive and periodically blockreport as well. Datanode skips datanode that are corrupted by checking the checksum. In this process the namenode will know which blocks are lost.

Fault Tolerance in HDFS:

Handling write failures:

The datanodes write the block in smaller units usually 64 KB called Packets.

Moreover each datanode replies back an acknowledgment for each packet to confirm that they got it. If the client does not get acknowledgment from some datanode, then the client comes to know that the datanode is dead. Then the client adjust the datanode to skip this. But hare the block data is under replication the namenode will take care of that, later on.

Handling Read failures:

When client asked for location of a block, the namenode gave location of all datanodes. If one datanode is dead client use to read from the rest.

Handling Datanode failures:

Namenode use to maintain two tables about,

First Table:- Which block of data is replicated on which of the datanodes.
Second Table:-Each datanode having which of the data.

Namenode continuously update these two tables. If the namenode find a block on a datanode is corrupted, Namenode update first table by removing bad datanode from blocks list and if namenode find that a datanode has dead name update both tables.

Under-replication Block:-

Namenode scan the first list(list of blocks) periodically and see there are not replicated properly. These are called “Under replicated” blocks. For all under replicated blocks.
For all under-replicated blocks namenode asks other datanodes to copy them from datanodes that have the same replica.
Then the namenode gives the command to the datanode which is not having the block data to copy the block data from the datanode which is having the replica.
HDFS cannot guaranty that at-least one replica will always survive. But it tries it best by smartly selecting replica locations.

Replica placement strategy:-

The cluster divides into racks. Each racks have multiple datanodes. Each replica allocation is simple. If the writer is a member of the cluster is it selected as first replica otherwise some datanode is selected.

Next two replica location:-

Pick a different rack than the first replica's. Select two different datanodes on that rack.

Subsequent replica location:-

Pick any random datanode if it satisfy below two conditions

Only one replica per datanode.
Max two replica per rack.

Crack the Interviews

Thursday, 24 April 2014

File distributions In HDFS

File distributions In HDFS

No comments:

Post a Comment

About Me