Apache Hadoop’s have two core component MapReduce and HDFS components originally derived respectively from Google File System (GFS) papers.
Hadoop is mainly used to stored ( HDFS) and process the data ( Map reduce )
Apache Hadoop ecosystem image from www.mssqltips.com
1.HDFS – Hadoop Distributed File system
- HDFS is a specially designed file system for storing huge data set with cluster of commodities hardware and streaming access pattern.
- As in Java – write once and run in N number of platform like that in Hadoop also “WORM” concept is used
- WORM – Write Once Read Many times without changing the data’s once file has been updated in HDFS
- Hadoop Core Services
- Name node
- Secondary Name node
- Job Tracker.
- Data Node
- Task Tracker
- Name,Secondary Name and Job tracker —-> Master node.
- Data and Task Tracker —> Slave node .
- Data will be divided into 64mb as default or else 128mb.
- Each data is replicated for 3 times as backup
- Name node holds the file system Metadata
- Files are broken up and spread over Data Node.
- Reads huge data sequentially after a single seek