Wednesday, March 16, 2016

Secondary Sort example in Pig

Problem: Get the month wise temparature in descending order

Input Data: SecodSort.txt
2012, 01, 01, 5
2012, 01, 02, 45
2012, 01, 03, 35
2012, 01, 04, 10
2001, 11, 01, 46
2001, 11, 02, 47
2001, 11, 03, 48
2001, 11, 04, 40
2005, 08, 20, 50
2005, 08, 21, 52
2005, 08, 22, 38
2005, 08, 23, 70


//Secondary sort
A = LOAD '/Users/surjanrawat/Documents/SecodSort.txt' using PigStorage(',') as (year:long,month:long,date:long,temp:long);
B = foreach A generate year,month,temp;
C = group B by (year, month);
D = foreach C {
X = ORDER B by temp desc;
Y  =  foreach X generate $2;
generate flatten(group),BagToString(Y,',');
};
Dump D;


Output
--------
(2001,11,48,47,46,40)
(2005,8,70,52,50,38)

(2012,1,45,35,10,5)

Tuesday, March 15, 2016

How is Pig job translated / converted to MapReduce Step by Step process

How is Pig jobs translated /converted to MapReduce Step by Step process

Refer to the below link for details about how is Pig Script translated/Converted to MapReduce.

Excerpt from the link.
The Pig system takes a Pig Latin program as input, compiles it into one or more Map-Reduce jobs, and then executes those jobs on a given Hadoop cluster. 


Any Pig script/program whether its running in local mode or MapReduce Mode  goes through a series of transformation steps before being executed.

Steps: