Friday, November 4, 2016

Pig UDF Error input.get(0) : The type org.apache.hadoop.io.WritableComparable cannot be resolved. It is indirectly referenced from required .class files

Error: The type org.apache.hadoop.io.WritableComparable cannot be resolved. It is indirectly referenced from required .class files. Getting error in Pig UDF when trying to get the value of a field from Tuple.


public String exec(Tuple input) throws IOException {
if(null == input || input.size()==0)return null;try{epochTime = Long.parseLong((String)input.get(0));//this line gives compilation error in eclipse}catch(Exception ex ){throw new IOException("Caught exception processing input "+input, ex);}

Solution: For Maven project, the below dependency needs to be there for the above compilation error.

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.7.1</version>
    <!-- <scope>provided</scope> -->
</dependency>

<dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <version>0.11.0-cdh4.7.1</version>
 <!-- <scope>provided</scope> -->
</dependency> 

Some times the same error message is seen for other missing jars like commons-logging or any other jar (hadoop related). We can find the jar name for the missing class and get the maven repo entry for that jar.

Tuesday, June 21, 2016

Scala Beginners Issues


1. Compiling / Running  scala from command line just like javac /java

scalac com/PrintMesgObj.scala 
scala -classpath . com.PrintMesgObj

2. Common Errors:

package com

class PrintMsg {

  def main(args: Array[String]) = {
    println("Hello World !!")
  }
}

if you try to run the above code and expect Hello World to be printed, it won't. See the output
surjanrawat$ scalac com/PrintMsg.scala 
surjanrawat$ scala -classpath . com.PrintMsg
java.lang.NoSuchMethodException: com.PrintMsg.main is not static
at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:68)

Reason: PrintMsg is class and to use this we need to define object of this class.

1.  Define object for PrintMsg
package com

object PrintMesgObj {
   val obj = new PrintMsg
   def main(args:Array[String])= {
     obj.main(args)
   }
}

surjanrawat$ scala -classpath . com.PrintMesgObj surjan
Hello World !!

Common Errors
1. java.lang.NoSuchMethodException: com.PrintMsg.main is not static
at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:68)

2. scala -classpath . com.PrintMesgObj
java.lang.NoSuchMethodException: com.PrintMesgObj.main([Ljava.lang.String;)

Tuesday, April 26, 2016

Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java

Problem with Custom value / key class when implementing Writable interface. Error Caused by: java.io.EOFException at java.io.DataInputStream.readInt

Solution: If having any java primitive types in Custom writable class, then while reading or writing
we should use the overloaded method for that type.
for e.g
- writeChars(String s) for String
 - writeInt(int i) for int
- writeLong(long l) for long
- writeBoolean(boolean b) for boolean

Stack-Trace:
java.lang.Exception: java.io.EOFException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at com.apple.Comments.MinMaxCountTuple.readFields(MinMaxCountTuple.java:48)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:145)

at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)


Custom Class: I am getting this exception with below custom Class

public class MinMaxCountTuple implements Writable {
private int min;
private int max;
private int count;

public void write(DataOutput out) throws IOException {
out.write(this.min);
out.write(this.max);
out.write(this.count);
}
public void readFields(DataInput in) throws IOException {
this.min = in.readInt();
this.max = in.readInt();
this.count = in.readInt();
}

// Changed the write method to below and it fixed.
public void write(DataOutput out) throws IOException {
out.writeInt(this.min);
out.writeInt(this.max);
out.writeInt(this.count);
}

Monday, April 11, 2016

Splitting on dot operator in Pig

For Input
101, iOS8.4

102, POS6.7

Expected Output is 
101, iOS8
102, POS6


A = LOAD '/home/hadoop/work/surjan/token/Test.txt' USING PigStorage(',') AS(id:long,a1:chararray);
B = FOREACH A GENERATE $0, FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray);
C = FOREACH B GENERATE $0, a1;

Wednesday, March 16, 2016

Secondary Sort example in Pig

Problem: Get the month wise temparature in descending order

Input Data: SecodSort.txt
2012, 01, 01, 5
2012, 01, 02, 45
2012, 01, 03, 35
2012, 01, 04, 10
2001, 11, 01, 46
2001, 11, 02, 47
2001, 11, 03, 48
2001, 11, 04, 40
2005, 08, 20, 50
2005, 08, 21, 52
2005, 08, 22, 38
2005, 08, 23, 70


//Secondary sort
A = LOAD '/Users/surjanrawat/Documents/SecodSort.txt' using PigStorage(',') as (year:long,month:long,date:long,temp:long);
B = foreach A generate year,month,temp;
C = group B by (year, month);
D = foreach C {
X = ORDER B by temp desc;
Y  =  foreach X generate $2;
generate flatten(group),BagToString(Y,',');
};
Dump D;


Output
--------
(2001,11,48,47,46,40)
(2005,8,70,52,50,38)

(2012,1,45,35,10,5)

Tuesday, March 15, 2016

How is Pig job translated / converted to MapReduce Step by Step process

How is Pig jobs translated /converted to MapReduce Step by Step process

Refer to the below link for details about how is Pig Script translated/Converted to MapReduce.

Excerpt from the link.
The Pig system takes a Pig Latin program as input, compiles it into one or more Map-Reduce jobs, and then executes those jobs on a given Hadoop cluster. 


Any Pig script/program whether its running in local mode or MapReduce Mode  goes through a series of transformation steps before being executed.

Steps:




Wednesday, February 24, 2016

java.lang.RuntimeException: readObject can't find class

INFO mapred.JobClient: Task Id : attempt_201512031955_66234_m_00025_0, Status : FAILED
java.lang.RuntimeException: readObject can't find class
at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:135)
at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:121)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:356)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:640)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)


Possible Causes: 
1. Check if below line is present in the driver class 
      job.setJarByClass(MyDriver.class);  // This method sets the jar file in which each node will look for the Mapper and Reducer classes. if this is not present then you will see lots of FAILED Tasks in the job tracker.

2. Any other classes being set like combiner, partitioner. Check if they are properly set


   job.setPartitionerClass(CustomPartitioner.class);
   job.setCombinerClass(MyCombiner.class);