[SUPPORT] Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool (com.pe.skull.titan.utils.SparkUtils) #12189

sushant-searce · 2024-11-01T06:09:43Z

Hello Hoodie support,

We are migrating from Hudi 12 to Hudi 15 in our production pipelins

but We are not able to migrate the pipelines and there is pipeline outage because pipelines are failing consistently.

sharingError Traces with you for one of the pipeline

ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81) at org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1015) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1013) at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1112) at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508) at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:473) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:473) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:449) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240) at com.pe.skull.titan.utils.SparkUtils.writeToTable(SparkUtils.java:137) at com.pe.skull.titan.tasks.PipelineRunner.writeData(PipelineRunner.java:154) at com.pe.skull.titan.tasks.PipelineRunner.processBatch(PipelineRunner.java:118) at com.pe.skull.titan.tasks.PipelineRunner.lambda$startPipelines$51830645$1(PipelineRunner.java:77) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1(DataStreamWriter.scala:505) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1$adapted(DataStreamWriter.scala:505) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:34) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:732) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:286) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:249) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:239) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:311) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:289) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$1(StreamExecution.scala:211) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:211) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing product_versions_snapshot_nrt at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79) ... 71 more Caused by: java.lang.NullPointerException at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions(TimelineUtils.java:110) at org.apache.hudi.sync.common.HoodieSyncClient.getDroppedPartitionsSince(HoodieSyncClient.java:97) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:289) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:179) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167) ... 72 more org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81) at org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1015) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1013) at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1112) at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508) at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:473) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:473) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:449) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240) at com.pe.skull.titan.utils.SparkUtils.writeToTable(SparkUtils.java:137) at com.pe.skull.titan.tasks.PipelineRunner.writeData(PipelineRunner.java:154) at com.pe.skull.titan.tasks.PipelineRunner.processBatch(PipelineRunner.java:118) at com.pe.skull.titan.tasks.PipelineRunner.lambda$startPipelines$51830645$1(PipelineRunner.java:77) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1(DataStreamWriter.scala:505) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1$adapted(DataStreamWriter.scala:505) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:34) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:732) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:286) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:249) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:239) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:311) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:289) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$1(StreamExecution.scala:211) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:211) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing i_recommendations_widget_shown_snapshot_nrt at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79) ... 71 more Caused by: java.lang.NullPointerException at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions(TimelineUtils.java:110) at org.apache.hudi.sync.common.HoodieSyncClient.getDroppedPartitionsSince(HoodieSyncClient.java:97) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:289) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:179) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167) ... 72 more

Can anyone help here as it's very urgent and causing production outage

The text was updated successfully, but these errors were encountered:

sushant-searce · 2024-11-01T06:10:27Z

Hello Team,

HUDI 15
We have performed some troubleshooting steps and tries with different hoodie properties. sharing test cases with you..

Test Case 1
Pipeline Run Status : Success

hoodie.clean.automatic: false 
hoodie.clean.async: false 
hoodie.datasource.hive_sync.enable: false

Test Case 2
Pipeline Run Status : Fail [ Table is different than case https://github.com//issues/1 ]

hoodie.clean.automatic: false 
hoodie.clean.async: false 
hoodie.datasource.hive_sync.enable: true

Test Case 3
Pipeline Run Status : Success [ Table is different than case https://github.com//issues/1 ]

hoodie.clean.automatic: true 
hoodie.clean.async: false  
hoodie.datasource.hive_sync.enable: false

Test Case 4
Pipeline Run Status : Success [ Table same as https://github.com//issues/1 ]

hoodie.clean.automatic: true 
hoodie.clean.async: false  
hoodie.datasource.hive_sync.enable: true

sushant-searce · 2024-11-01T06:11:16Z

As you can see the test cases I have shared above
Pipeline was working yesterday after disabling and then enabling the hive_sync.

Yesterday we disabled hive_sync in the pipeline and it ran successfully
and enabled it again in next and that run successfully as well

but IN TODAYs run it FAILED with same error

It is very concerning is there anything we are missing here

Sharinf Hoodie Options as well

hudiOptions:

hoodie.cleaner.commits.retained: 10
hoodie.metadata.keep.max.commits: 30
hoodie.metadata.clean.async: false
hoodie.keep.max.commits: 30
hoodie.metadata.keep.min.commits: 20
hoodie.archive.async: false
hoodie.clean.automatic: true
hoodie.finalize.write.parallelism: 200
hoodie.fail.on.timeline.archiving: false
hoodie.clean.async: false
hoodie.parquet.max.file.size: 128000000
hoodie.datasource.hive_sync.support_timestamp : true
#DISABLING METADATA TO REDUCE FREQUENT CALLS TO GCS
hoodie.metadata.enable: false
hoodie.datasource.write.hive_style_partitioning : true
hoodie.parquet.small.file.limit: 100000000
hoodie.datasource.hive_sync.enable: true
hoodie.bulkinsert.shuffle.parallelism: 200
hoodie.keep.min.commits: 11
hoodie.datasource.meta.sync.enable: true
hoodie.metadata.cleaner.commits.retained: 3
hoodie.cleaner.incremental.mode: true
hoodie.commits.archival.batch: 12
hoodie.upsert.shuffle.parallelism: 200
hive_sync.support_timestamp: true
hoodie.insert.shuffle.parallelism: 200
hoodie.metadata.compact.max.delta.commits: 10
compaction.delta_commits: 5
metadata.compaction.delta_commits: 10
hoodie.compact.inline.max.delta.commits: 5
hoodie.archive.automatic: true
hoodie.cleaner.parallelism: 200

danny0405 · 2024-11-02T00:32:46Z

Similiar issue: #11955 already reported.

danny0405 · 2024-11-02T00:33:33Z

@ad1happy2go Can you prioritize this issue because multiple issues are reported.

sushant-searce · 2024-11-02T02:50:56Z

@danny0405 @ad1happy2go

yes i went through the ticket #11955 but I don't see any solution attached in the ticket.

If you can share solution with me that will really help

sushant-searce · 2024-11-02T03:02:15Z

@danny0405 @ad1happy2go

Just for your reference

Hadoop - 3.3.6
Hive - 3.1.3
Hudi - 0.15.0
Spark - 3.5.1

ad1happy2go · 2024-11-04T10:42:06Z

@sushant-searce Can you provide more details about your enviorment details. Are you using EMR or dataproc? if yes then can you provide us details about that?
I tried with OSS spark 3.5.1 and hudi 0.15.0 but unable to reproduce any issue.

sushant-searce · 2024-11-07T06:26:08Z

Hello @danny0405 @ad1happy2go ,

We are using dataproc

ad1happy2go · 2024-11-07T09:12:19Z

@sushant-searce Thanks for all your support but Can you please tell us the full details about the dataproc version you are using and provide steps to reproduce.

sushant-searce · 2024-11-07T13:51:38Z

Hello @ad1happy2go

We are using dataproc 2.2 version

To reproduce the issue -

write data into GCS using hudi12 with hudi properties shared by me above
try to rewrite the data into GCS using hudi15

sushant-searce · 2024-11-07T13:52:35Z

Hello @ad1happy2go @danny0405

It looks like some issue with hudi clean file.
schema is different for clean files in hudi 12 and hudi 15 that is what causing the issue

danny0405 added meta-sync table-service labels Nov 2, 2024

danny0405 added this to Hudi Issue Support Nov 2, 2024

github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Nov 2, 2024

danny0405 added the priority:critical production down; pipelines stalled; Need help asap. label Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool (com.pe.skull.titan.utils.SparkUtils) #12189

[SUPPORT] Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool (com.pe.skull.titan.utils.SparkUtils) #12189

sushant-searce commented Nov 1, 2024

sushant-searce commented Nov 1, 2024

sushant-searce commented Nov 1, 2024 •

edited

Loading

danny0405 commented Nov 2, 2024

danny0405 commented Nov 2, 2024

sushant-searce commented Nov 2, 2024

sushant-searce commented Nov 2, 2024

ad1happy2go commented Nov 4, 2024

sushant-searce commented Nov 7, 2024

ad1happy2go commented Nov 7, 2024

sushant-searce commented Nov 7, 2024

sushant-searce commented Nov 7, 2024

[SUPPORT] Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool (com.pe.skull.titan.utils.SparkUtils) #12189

[SUPPORT] Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool (com.pe.skull.titan.utils.SparkUtils) #12189

Comments

sushant-searce commented Nov 1, 2024

sushant-searce commented Nov 1, 2024

sushant-searce commented Nov 1, 2024 • edited Loading

danny0405 commented Nov 2, 2024

danny0405 commented Nov 2, 2024

sushant-searce commented Nov 2, 2024

sushant-searce commented Nov 2, 2024

ad1happy2go commented Nov 4, 2024

sushant-searce commented Nov 7, 2024

ad1happy2go commented Nov 7, 2024

sushant-searce commented Nov 7, 2024

sushant-searce commented Nov 7, 2024

sushant-searce commented Nov 1, 2024 •

edited

Loading