Thursday, February 23, 2017

org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum 1490267964939 is not in union ["null","long"]

Problem in Pig when using Store as AvroStorage():

org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum 1490267964939 is not in union ["null","long"]

Solution: 
1. Check for the datatypes carefully while storing the final schema
2. Most probable reason for this is that there is some null value being type casted to int or long. But in avro we always get the error for next line (which can be correct) 

In our case it was complaining Not about the actual null value but the giving error for a valid value. I think this misleading error when using avro makes it difficult to diagnose.  
for example. 
A = LOAD 'surjan/data1' using org.apache.pig.piggybank.storage.avro.AvroStorage();
B = foreach A generate date, empId;
C = DISTINCT B;
store C into 'surjan/data2' ;

If the dataset  'surjan/data1' is not present , then avro will complain saying no date found or no empId found instead of saying data does not exist or matches 0 files.


3. Using AvroStorage with index option using schema. Index option should be used when storing more than 1 datasets using avro schema

Store finalData  into 'surjan/location' USING org.apache.pig.piggybank.storage.avro.AvroStorage('index', '0','schema','{"namespace":"com.surjan.schema.myapp.avro","type":"record","name":"Mydaily jon","doc":"Avro storing with schema using Pig.","fields" ...rest of schema


org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Unsupported type in record:class java.lang.Long at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263) at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722) at 

Solutiomn: This is issue with storing single field in avro. Store 1 more dummy field and error will go.

also see this : https://issues.apache.org/jira/browse/PIG-3358

4 comments:

  1. Replies
    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important. Project Center in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

      Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
      Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai


      The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

      Delete