It is an high level data flow scripting language which run on hadoop Cluster.
Why new components “Pig” added in Hadoop
User will have some common challenges to customize or extend the Map reduce programming using Java
For adding some new concepts in map, split, and reduce fundamentals, which may increase production time – for eg in order to process the unstructured data’s which has some unwanted values in each rows .
Inorder to remove those values and process the data’s in map reduce ( we need to add some new conceptual in Map reduce programming ) which will have an impact on production time
Sample data :-
Create the logic ( pig Latin script ) to filter the unwanted data’s and compiled as “jar” in Pig component and those jar file will used before data flow in to Map/reduce function
What Is Pig Useful For?
Pig Latin use cases tend to fall into three separate categories:
1. ETL data pipelines
2. Research on raw data
3. Iterative processing.
A common example is bringing in logs from their web servers, cleaning the data( Like facebook comments,chat logs ) and precomputing common aggregates before loading it into their data warehouse.
Pig Work Flow