-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for concurrent write on Iceberg transformed column #24160
base: master
Are you sure you want to change the base?
Add support for concurrent write on Iceberg transformed column #24160
Conversation
0f00cdd
to
ff872d7
Compare
@@ -2740,7 +2740,8 @@ private void finishWrite(ConnectorSession session, IcebergTableHandle table, Col | |||
|
|||
RowDelta rowDelta = transaction.newRowDelta(); | |||
table.getSnapshotId().map(icebergTable::snapshot).ifPresent(s -> rowDelta.validateFromSnapshot(s.snapshotId())); | |||
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().filter((column, domain) -> !isMetadataColumnId(column.getId())); | |||
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().intersect(table.getUnenforcedPredicate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we intersect enforced with the unenforced predicate?
They are rather unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that both carriers information about predicate.
The difference is that taken from @raunaqmorarka explanation:
enforced is the part of the predicate which is guaranteed to be satisfied by the connector, so the engine will not apply it on it's side.
unenforced is the part of the predicate which connector cannot guarantee even if it is able to use it to reduce output, so the engine will apply it on the connector output
@raunaqmorarka Correct me if I'm wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{2:part:date=[ SortedRangeSet[type=date, ranges=1, {[2024-01-01]}] ]}
I'm checking io.trino.plugin.iceberg.TestIcebergLocalConcurrentWritesTest#testConcurrentUpdateWithOverlappingPartitionTransformation
(BTW really cool that we have this battery of new concurrency tests)
It's unclear to me why the partition predicate is not an "enforced predicate" while debugging your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding such predicates for transformed columns as enforced, would mean pushdowns of those values and connector would need to filter files and rows also during reading from such table as well. It's not supported right now and would require more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are rather unrelated.
We already join those in IcebergSplitSource
trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Line 259 in 61a79dd
TupleDomain<IcebergColumnHandle> effectivePredicate = TupleDomain.intersect( |
ff872d7
to
8da17cc
Compare
8da17cc
to
c9670a6
Compare
Description
Add concurrent write to scenarios like:
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: