Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for concurrent write on Iceberg transformed column #24160

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pajaks
Copy link
Member

@pajaks pajaks commented Nov 18, 2024

Description

Add concurrent write to scenarios like:

-- create test table in iceberg
create table all_defaults_partitioned
with (
  partitioning = array['month(shipdate)']
)
as select * from tpch.sf1000.lineitem

-- first session
update all_defaults_partitioned
set orderkey = 654
where shipdate = date '1995-01-01'

-- second session 
update all_defaults_partitioned
set orderkey = 765
where shipdate = date '1996-12-01'

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Add support for concurrent write on Iceberg transformed column

@cla-bot cla-bot bot added the cla-signed label Nov 18, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Nov 18, 2024
@pajaks pajaks force-pushed the pajaks/iceberg_partition_concurrent_writes branch from 0f00cdd to ff872d7 Compare November 18, 2024 11:10
@@ -2740,7 +2740,8 @@ private void finishWrite(ConnectorSession session, IcebergTableHandle table, Col

RowDelta rowDelta = transaction.newRowDelta();
table.getSnapshotId().map(icebergTable::snapshot).ifPresent(s -> rowDelta.validateFromSnapshot(s.snapshotId()));
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().filter((column, domain) -> !isMetadataColumnId(column.getId()));
TupleDomain<IcebergColumnHandle> dataColumnPredicate = table.getEnforcedPredicate().intersect(table.getUnenforcedPredicate())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we intersect enforced with the unenforced predicate?
They are rather unrelated.

Copy link
Member Author

@pajaks pajaks Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that both carriers information about predicate.
The difference is that taken from @raunaqmorarka explanation:

enforced is the part of the predicate which is guaranteed to be satisfied by the connector, so the engine will not apply it on it's side.
unenforced is the part of the predicate which connector cannot guarantee even if it is able to use it to reduce output, so the engine will apply it on the connector output

@raunaqmorarka Correct me if I'm wrong here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{2:part:date=[ SortedRangeSet[type=date, ranges=1, {[2024-01-01]}] ]}

I'm checking io.trino.plugin.iceberg.TestIcebergLocalConcurrentWritesTest#testConcurrentUpdateWithOverlappingPartitionTransformation (BTW really cool that we have this battery of new concurrency tests)

It's unclear to me why the partition predicate is not an "enforced predicate" while debugging your code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding such predicates for transformed columns as enforced, would mean pushdowns of those values and connector would need to filter files and rows also during reading from such table as well. It's not supported right now and would require more work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are rather unrelated.

We already join those in IcebergSplitSource

TupleDomain<IcebergColumnHandle> effectivePredicate = TupleDomain.intersect(

@pajaks pajaks force-pushed the pajaks/iceberg_partition_concurrent_writes branch from ff872d7 to 8da17cc Compare November 20, 2024 12:55
@pajaks pajaks force-pushed the pajaks/iceberg_partition_concurrent_writes branch from 8da17cc to c9670a6 Compare November 21, 2024 10:40
@pajaks pajaks marked this pull request as ready for review November 21, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

2 participants