Hadoop Architecture and components

Apache Hadoop’s have two core component MapReduce and HDFS components originally derived respectively from Google File System (GFS) papers.

Hadoop is mainly used to stored ( HDFS) and process the data ( Map reduce )

Apache Hadoop ecosystem image from www.mssqltips.com

1.HDFS – Hadoop Distributed File system

HDFS is a specially designed file system for storing huge data set with cluster of commodities hardware and streaming access pattern.
As in Java – write once and run in N number of platform like that in Hadoop also “WORM” concept is used
- WORM – Write Once Read Many times without changing the data’s once file has been updated in HDFS
Hadoop Core Services
- Name node
- Secondary Name node
- Job Tracker.
- Data Node
- Task Tracker
Name,Secondary Name and Job tracker —-> Master node.
Data and Task Tracker —> Slave node .
Data will be divided into 64mb as default or else 128mb.
Each data is replicated for 3 times as backup
Name node holds the file system Metadata
Files are broken up and spread over Data Node.
Reads huge data sequentially after a single seek

	ravi bose on Building Big Data Analytics wi…
	About Machine Learni… on About Machine Learning …
	Big Data with Apache… on Building Big Data Analytics wi…
	Ihor on Export DataGrid To html F…
	MapReduce Component… on Hadoop Architecture and c…

Sensaran's Tips & Tricks