Name Node ( Single Instance ) :
- Name node contains file system Namespace i.e metadata . if there is any change in file system or in storage pattern , this will be tracked in Name node say for eg . if the files is deleted from HDFS or else any change or modification then name node will change in their EDIT log
- It will initiate the Data node to perform the actions
- It maintain the record how the files in HDFS is split-ted and stored.
- It will receive the heartbeat and black report from the data node . based on that the communication replication factor will happen.
Secondary Node ( Single Instance ) :
- It will act as back up for Name mode server
- keeps the Namespace image through edit log
Data Node ( Multiple Instance ) :
- The Data-node is responsible for storing the files in HDFS.
- It manages the file blocks within the node. It sends information to the NameNode about the files and blocks stored in that node and responds to the NameNode for all filesystem operations.
- Datanodes send heartbeats to the NameNode once every 3 seconds, to report the overall health of HDFS
- Datanodes also enables pipelining of data and it;s forward data to other nodes.
- The data nodes can talk to each other to rebalance data, move and copy data around and keep the replication high.