File distributions In HDFS
- Overview on HDFS :-
- HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.
- HDFS
has a master/slave architecture. An HDFS cluster consists of a
single NameNode, a master server that manages the file system
namespace and regulates access to files by clients.
- In
addition, there are a number of DataNodes, usually one per node in
the cluster, which manage storage attached to the nodes that they
run on. HDFS exposes a file system namespace and allows user data to
be stored in files.
- Internally,
a file is split into one or more blocks and these blocks are stored
in a set of DataNodes. The NameNode executes file system namespace
operations like opening, closing, and renaming files and
directories. It also determines the mapping of blocks to DataNodes.
- The
DataNodes are responsible for serving read and write requests from
the file system’s clients. The DataNodes also perform block
creation, deletion, and replication upon instruction from the
NameNode.
First
an HDFS client (system gives the command to the HDFS) and gives
command to write /read data/files.
- In the HDFS architecture there will be only one namenode and that co-ordinates everything around here.
- The datanodes all the data .There are many datanodes sometimes even thousands.
- Suppose the client wants to write 200mb data into the HDFS then the client do two things.
Writing
the data into HDFS
- Divides the data in 128 MB.
- Copy each block in three places.
- A client should know two things
- Blocksize:-Larger files is divided in blocks usually 64 MB/128 MB.
- Replication Factor:-Each block is stored in multiple locations.
- Then the client the command to the namenode to write 128 MB block with the replication of 3.
- Then the namenode finds 3 datanodes for this client depending on the replication factor.
- Then the namenode address 3 datanodes and stored them in increasing distance from the namenode.
- Then the client sends the data to the first datanode only and datanode stores the data in its hard drive.
- While the data node receiving data it forwards the same data to the next datanode.
- Then this process goes on, till 3 datanodes does not get the data. The namenode sends the data to the one data node only.
- After all blocks are written client closes the file. Now the namenode store all the meta information in persistence storage/hard disk.
- The client divides the file into blocks , for the address of the datanodes.
- All the datanode stored data via Replication pipeline.
- While the first datanode receiving the data from the client, at the same time it write the data to another datanode and so on. Depending upon replication factor. This is called the “Replication Pipeline”.
- Client:-Divides the file into blocks.
- Namenode:-Provides address of Datanodes.
- Datanode:-Stores data via Replication pipeline.
- For reading file from HDFS, client first gives information(filename...etc) about the file to the namenode.
- Then the namenode replay,
Reading
files from HDFS:
- List of all blocks for this file.
- List of datanodes for each block.
- Then the clients comes to know – How many blocks to download and the datanodes where each block is stored.
Fault
Tolerance:
- Types of fault and their detection
- Node failure.
- Communication Failure.
- Data Corruption.
- Node failure:
- If namenode is dead ,the entire cluster is dead. Namenode is the single point of failure.
Detecting
Datanode Failure:-
- Every datanodes sends heartbeat message every 3 seconds.
- If the namenode does not get the message in 10 minutes, the datanode is dead for the namenode.
- It may be a network failure but the namenode treats it as the same as networkfailure.
- When ever data is sent an acknowledgment is replied (after several retrieves) the sender assumes that the host is dead, or the network has failed.
Corrupted
Data:
- Every time checksum is sent along with transmitted data.
- Whenever the datanode sores the data it's also stores the checksum.
Detecting
corrupted Hard drive:
- Periodically all datanode send BlockReports to the namenode.
- BlockReports:- list of all blocks the datanode have. Before sending the BlockReports datanode checks info for blocks that are corrupted.
- Before sending report datanode checks if checksums are OK. Datanode do not send info for blocks that are corrupted.
- All datanodes send heartbeats every 3 seconds to say we are alive and periodically blockreport as well. Datanode skips datanode that are corrupted by checking the checksum. In this process the namenode will know which blocks are lost.
Fault
Tolerance in HDFS:
- Handling write failures:
The
datanodes write the block in smaller units usually 64 KB called
Packets.
Moreover
each datanode replies back an acknowledgment for each packet to
confirm that they got it. If the client does not get acknowledgment
from some datanode, then the client comes to know that the datanode
is dead. Then the client adjust the datanode to skip this. But hare
the block data is under replication the namenode will take care of
that, later on.
- Handling Read failures:
When
client asked for location of a block, the namenode gave location of
all datanodes. If one datanode is dead client use to read from the
rest.
- Handling Datanode failures:
Namenode
use to maintain two tables about,
- First Table:- Which block of data is replicated on which of the datanodes.
- Second Table:-Each datanode having which of the data.
Namenode
continuously update these two tables. If the namenode find a block
on a datanode is corrupted, Namenode update first table by removing
bad datanode from blocks list and if namenode find that a datanode
has dead name update both tables.
Under-replication
Block:-
- Namenode scan the first list(list of blocks) periodically and see there are not replicated properly. These are called “Under replicated” blocks. For all under replicated blocks.
- For all under-replicated blocks namenode asks other datanodes to copy them from datanodes that have the same replica.
- Then the namenode gives the command to the datanode which is not having the block data to copy the block data from the datanode which is having the replica.
- HDFS cannot guaranty that at-least one replica will always survive. But it tries it best by smartly selecting replica locations.
- Replica placement strategy:-
- The cluster divides into racks. Each racks have multiple datanodes. Each replica allocation is simple. If the writer is a member of the cluster is it selected as first replica otherwise some datanode is selected.
- Next two replica location:-
- Pick a different rack than the first replica's. Select two different datanodes on that rack.
- Subsequent replica location:-
- Pick any random datanode if it satisfy below two conditions
- Only one replica per datanode.
- Max two replica per rack.
No comments:
Post a Comment