-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Hive meta sync null pointer issue #11955
Comments
It looks like the clean instant is missing: try {
HoodieCleanMetadata cleanMetadata = TimelineMetadataUtils.deserializeHoodieCleanMetadata(cleanerTimeline.getInstantDetails(instant).get());
cleanMetadata.getPartitionMetadata().forEach((partition, partitionMetadata) -> {
if (partitionMetadata.getIsPartitionDeleted()) {
partitionToLatestDeleteTimestamp.put(partition, instant.getTimestamp());
}
});
} |
@danny0405 What is the above process doing and how will it help solve our use case ?? |
Did you check your clean metadata file and see what the partition metadata looks like? |
@danny0405 Before that I want to understand what you think might have happened which led you to propose the above solution. It would have worked fine without exceptions had we directly migrated hudi version from 0.10.1 to 0.15.0 with hive sync enabled in hudi properties. But we migrated hudi version from 0.10.1 to 0.15.0 without hive sync initially. And then enabled it in hudi options. Can you explain what difference would have led to this issue. |
Also my partition metadata in .clean file looks like
|
I have no idea why the hive sync could affect the hoodie properties, did you diff the hoodie.properties between the two kind of operations?
Just from the error stacktrace, it looks like related with the clean metadata. |
Hudi options while writing to hudi table when we migrated from 0.10.1 to 0.15.0
hoodie.properties after that migration
extra hudi options added later for hive sync
after adding above options hoodie.properties never changed. |
@danny0405
when running
hadoop version : Hadoop 3.3.0 |
This is a hadoop jar conflict. |
Was able to solve for this but still getting other exceptions
Getting exception
Please help here. |
you need to add the aws sdk jar which contains the s3 filesystem. |
@bytesid19 Let us know in case you need any more help here. Thanks. Feel free to close if we are all good here |
@bytesid19 Closing the issue. Feel free to reopen or create new one in case of any more issues. Thanks. |
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
We recently migrated our hudi version from 0.10.1 to 0.15.0 in our spark jobs.
Hudi options at that time
During that time hive meta sync was not enabled.
Now we had the requirement to sync data to hive via hms mode, for which we added following options to above
When we triggered a new spark job after that to write new data to the same hudi table, the job ran successfully and added the new data to hudi table but did not update hudi properties.
But now whenever we are triggering the spark job to write new data, job is failing and we are getting null pointer exception at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114)
at java.util.HashMap.forEach(HashMap.java:1290)
at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
hoodie.database.name
.Environment Description
Hudi version :
0.15.0
Spark version :
3.3.2
Hive version :
3.1.2
Hadoop version :
3.0.0
Storage (HDFS/S3/GCS..) :
S3
Running on Docker? (yes/no) :
no
Additional context
Add any other context about the problem here.
Stacktrace
The text was updated successfully, but these errors were encountered: