Problem in Pig when using Store as AvroStorage():
Solution:
1. Check for the datatypes carefully while storing the final schema
2. Most probable reason for this is that there is some null value being type casted to int or long. But in avro we always get the error for next line (which can be correct)
In our case it was complaining Not about the actual null value but the giving error for a valid value. I think this misleading error when using avro makes it difficult to diagnose.
for example.
A = LOAD 'surjan/data1' using org.apache.pig.piggybank.storage.avro.AvroStorage();
B = foreach A generate date, empId;
C = DISTINCT B;
store C into 'surjan/data2' ;
If the dataset 'surjan/data1' is not present , then avro will complain saying no date found or no empId found instead of saying data does not exist or matches 0 files.
3. Using AvroStorage with index option using schema. Index option should be used when storing more than 1 datasets using avro schema
Store finalData into 'surjan/location' USING org.apache.pig.piggybank.storage.avro.AvroStorage('index', '0','schema','{"namespace":"com.surjan.schema.myapp.avro","type":"record","name":"Mydaily jon","doc":"Avro storing with schema using Pig.","fields" ...rest of schema
org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum 1490267964939 is not in union ["null","long"]
Solution:
1. Check for the datatypes carefully while storing the final schema
2. Most probable reason for this is that there is some null value being type casted to int or long. But in avro we always get the error for next line (which can be correct)
In our case it was complaining Not about the actual null value but the giving error for a valid value. I think this misleading error when using avro makes it difficult to diagnose.
for example.
A = LOAD 'surjan/data1' using org.apache.pig.piggybank.storage.avro.AvroStorage();
B = foreach A generate date, empId;
C = DISTINCT B;
store C into 'surjan/data2' ;
If the dataset 'surjan/data1' is not present , then avro will complain saying no date found or no empId found instead of saying data does not exist or matches 0 files.
3. Using AvroStorage with index option using schema. Index option should be used when storing more than 1 datasets using avro schema
Store finalData into 'surjan/location' USING org.apache.pig.piggybank.storage.avro.AvroStorage('index', '0','schema','{"namespace":"com.surjan.schema.myapp.avro","type":"record","name":"Mydaily jon","doc":"Avro storing with schema using Pig.","fields" ...rest of schema
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.RuntimeException: Unsupported type in record:class
java.lang.Long
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
at
Solutiomn: This is issue with storing single field in avro. Store 1 more dummy field and error will go.
also see this : https://issues.apache.org/jira/browse/PIG-3358
Solutiomn: This is issue with storing single field in avro. Store 1 more dummy field and error will go.
also see this : https://issues.apache.org/jira/browse/PIG-3358
I like your blog.You have done a good job.Thanks for the useful post.
ReplyDeletecore java training in chennai
C++ Training in Chennai
C C++ Training in Chennai
javascript training institute in chennai
javascript training in chennai
core java training in chennai
core java training
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important. Project Center in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
DeleteSpring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai
The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
Nice article!thanks for sharing such great post with us. i studied all your information and it is really good.
ReplyDeleteAndroid Training in Chennai
Android Training in T Nagar
JAVA Training in Chennai
Python Training in Chennai
Big data training in chennai
Selenium Training in Chennai
Android Training in Chennai
Android Course in Chennai
nice blog..valuable information....thanks for sharing...
ReplyDeleteStudy Abroad Consultants in Kerala
study abroad consultants in thrissur
Study Abroad Consultants in Calicut
abroad job consultancy in coimbatore
best overseas education consultants in thrissur
overseas education consultants in kozhikode
study abroad
study in poland
study in europe
free abroad study