Tuesday, April 26, 2016

Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java

Problem with Custom value / key class when implementing Writable interface. Error Caused by: java.io.EOFException at java.io.DataInputStream.readInt

Solution: If having any java primitive types in Custom writable class, then while reading or writing
we should use the overloaded method for that type.
for e.g
- writeChars(String s) for String
 - writeInt(int i) for int
- writeLong(long l) for long
- writeBoolean(boolean b) for boolean

Stack-Trace:
java.lang.Exception: java.io.EOFException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at com.apple.Comments.MinMaxCountTuple.readFields(MinMaxCountTuple.java:48)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:145)

at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)


Custom Class: I am getting this exception with below custom Class

public class MinMaxCountTuple implements Writable {
private int min;
private int max;
private int count;

public void write(DataOutput out) throws IOException {
out.write(this.min);
out.write(this.max);
out.write(this.count);
}
public void readFields(DataInput in) throws IOException {
this.min = in.readInt();
this.max = in.readInt();
this.count = in.readInt();
}

// Changed the write method to below and it fixed.
public void write(DataOutput out) throws IOException {
out.writeInt(this.min);
out.writeInt(this.max);
out.writeInt(this.count);
}

Monday, April 11, 2016

Splitting on dot operator in Pig

For Input
101, iOS8.4

102, POS6.7

Expected Output is 
101, iOS8
102, POS6


A = LOAD '/home/hadoop/work/surjan/token/Test.txt' USING PigStorage(',') AS(id:long,a1:chararray);
B = FOREACH A GENERATE $0, FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray);
C = FOREACH B GENERATE $0, a1;