[Hadoop] HDFS Architecture

1 분 소요

What is HDFS Architecture?

  • Hadoop Distributed File System (HDFS) is like Master-Worker architecture. The master is the NameNode and the workers are the low-cost commodity hardware. In the DataNodes, the actual data is stored. In this architecture, there is single NameNode and multiple DataNodes.

What is the task of NameNode?

  • The NameNode is used to store the meta-data and another data related for DataNodes. The NameNode also reponsible for:
    • Managing the file-system namespace
    • It controls the access of different clients into the data blocks.
    • Periodically checks the availablility of the DataNodes.
    • It also care about the replication factor of the data blocks.

What is the task of DataNodes?

  • DataNodes are the main storage of data. Hadoop uses low-cost hardware to store data.
  • DataNodes are responsible for storing, replication creatiing, deleting these type of jobs according to the instruction of NameNode.
  • These DataNodes send the health report to the NameNode periodically. The default time is 3 seconds. So after every 3 seconds, these send the report to the NameNode.

What is the Secondary NameNode?

  • Secondary NameNode: The Secondary NameNode is another specially dedicated node, which is used to take the checkpoints of the file-system. The Secondary NameNode is not the substitute of the Primary NameNode. It helps the NameNode but not replace for NameNode.

댓글남기기