Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace partial topN with limit when reading pre-sorted input #6634

Merged
merged 6 commits into from
May 25, 2021

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Jan 18, 2021

Fixes #5372

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. @findepi @electrum do you want to take a look?

@@ -16,6 +16,7 @@
import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.collect.ImmutableList;
import io.trino.plugin.hive.metastore.SortingColumn;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is metastore-specific class and it could be better to use a dedicaed, metastore-agnostic representation here.

i cannot offer a concrete proposal here yet, though

plugin/trino-hive/src/test/sql/create-test.sql Outdated Show resolved Hide resolved
@raunaqmorarka raunaqmorarka force-pushed the sorted_by_local branch 3 times, most recently from 0e253a4 to 274fdba Compare January 19, 2021 15:34
@raunaqmorarka raunaqmorarka changed the title Populate local sorting property for bucketed sorted hive tables Avoid partial sort when reading pre-sorted input Jan 19, 2021
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments

@raunaqmorarka raunaqmorarka force-pushed the sorted_by_local branch 3 times, most recently from 2bab2b4 to b38589f Compare January 21, 2021 17:26
@raunaqmorarka raunaqmorarka changed the title Avoid partial sort when reading pre-sorted input Replace partial topN with limit when reading pre-sorted input Jan 21, 2021
@raunaqmorarka raunaqmorarka requested a review from sopel39 January 22, 2021 10:04
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments

@sopel39
Copy link
Member

sopel39 commented Jan 22, 2021

mind automation

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add integration test to TestHiveIntegrationSmokeTest too what verifies plan with sort and limit

@@ -414,6 +414,16 @@ public PlanWithProperties visitTopN(TopNNode node, PreferredProperties preferred
break;
case PARTIAL:
child = planChild(node, PreferredProperties.any());
// If source is pre-sorted, partial topN can be replaced with partial Limit N
List<LocalProperty<Symbol>> desiredProperties = node.getOrderingScheme().getOrderBy().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This most likely deserves a separate rule but such rule would have to derive properties too (which is not the rule way). I think it's fine to keep simplification here since it's rather trivial and all information are at hand. cc @martint

@raunaqmorarka
Copy link
Member Author

We could add integration test to TestHiveIntegrationSmokeTest too what verifies plan with sort and limit

Added testOptimizeSortedFileRead in TestHiveIntegrationSmokeTest which verifies change in plan with sorted hive table

@martint martint merged commit 4103acd into trinodb:master May 25, 2021
@martint martint added this to the 358 milestone May 25, 2021
@raunaqmorarka raunaqmorarka deleted the sorted_by_local branch May 26, 2021 02:53
@raunaqmorarka
Copy link
Member Author

For kicks, I hooked this up with the Phoenix connector and it works perfectly - with noticeable performance improvements.

@lhofhansl you can try this one now with Phoenix connector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Use local limit instead of local Top N if stream is sorted on ordering column
7 participants