-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline folder under .hoodie #12288
base: master
Are you sure you want to change the base?
Conversation
if (needsUpgradeOrDowngrade(metaClient)) { | ||
// unclear what instant to use, since upgrade does have a given instant. | ||
executeUsingTxnManager(Option.empty(), () -> tryUpgrade(metaClient, Option.empty())); | ||
updatedMetaClient = createMetaClient(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We better update the options within tryUpgrade
, like what we already do: reloadTableConfig
, reloadActiveTimeline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metaClient is locally passed to this method which in turn needs to be passed to startCommit. On upgrade, metaclient is stale (wrong layoutVetsion) and needs to be updated. This is the reason for the above change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just refresh the timelineLayoutVersion
and timelineLayout
from the table config right after the table config are reloaded, so the refresh sequence become:
- refresh table config
- refresh timeline layout
- refresh timeline
Or we just remove the reload in tryUpgrade
which is much simpler, and we need to take care all the calls of tryUpgrade
.
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CommitMetadataSerDe.java
Outdated
Show resolved
Hide resolved
@danny0405 : Apart from the two comments, Are you ok with other changes ? |
Yes, overall looks good to me, I reviewed |
...mmon/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ActiveTimelineV2.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
Outdated
Show resolved
Hide resolved
@bvaradar Have not looked very closely. but can we also make sure the lsm timeline moves along with new folder.. Call it so
over 1.1 or 1.2 I want to fully converge it to a single timeline. |
...-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java
Outdated
Show resolved
Hide resolved
5a9c25f
to
a90e9ea
Compare
@@ -711,7 +711,7 @@ class TestSpark3DDL extends HoodieSparkSqlTestBase { | |||
|
|||
test("Test schema auto evolution") { | |||
withTempDir { tmp => | |||
Seq("COPY_ON_WRITE", "MERGE_ON_READ").foreach { tableType => | |||
Seq("COPY_ON_WRITE").foreach { tableType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to revisit this after fixing all other tests. For MOR, this test fails due to
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/private/var/folders/s5/pqxf5ndx12qg6h0zgl2d9zxh0000gn/T/spark-83fab01d-7af8-4c22-9dee-8d840aa02e90/h1/americas/brazil/sao_paulo/c7c9ab23-56f7-45f4-bdbe-d7a8de9671bf-0_0-22-35_20241122094757341.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:264)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:210)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRowIndexUtil$RecordReaderWithRowIndexes.nextKeyValue(ParquetRowIndexUtil.scala:89)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)
at org.apache.hudi.util.CloseableInternalRowIterator.hasNext(CloseableInternalRowIterator.scala:50)
at org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.doHasNext(HoodieKeyBasedFileGroupRecordBuffer.java:134)
at org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.hasNext(HoodieBaseFileGroupRecordBuffer.java:149)
at org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:235)
at org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:289)
at org.apache.spark.sql.execution.datasources.parquet.HoodieFileGroupReaderBasedParquetFileFormat$$anon$1.hasNext(HoodieFileGroupReaderBasedParquetFileFormat.scala:273)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:129)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:283)
... 22 more
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableAny cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableDouble
at org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.setDouble(SpecificInternalRow.scala:284)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RowUpdater.setDouble(ParquetRowConverter.scala:185)
at org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter.addDouble(ParquetRowConverter.scala:96)
at org.apache.parquet.column.impl.ColumnReaderBase$2$2.writeValue(ColumnReaderBase.java:269)
at org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:440)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:234)
... 34 more
Change Logs
Hudi 1.x to use dedicated timeline folder under .hoodie
Impact
Timeline folder to move to a dedicated folder.
Risk level (write none, low medium or high below)
none
Documentation Update
none
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist