-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: refine replace into pruning for table with cluster keys #12147
Merged
BohuTANG
merged 12 commits into
databendlabs:main
from
dantengsky:refactor-replace-into-pruning
Jul 24, 2023
Merged
refactor: refine replace into pruning for table with cluster keys #12147
BohuTANG
merged 12 commits into
databendlabs:main
from
dantengsky:refactor-replace-into-pruning
Jul 24, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
dantengsky
changed the title
refactor: refine replace into pruning
refactor: refine replace into pruning for table with cluster keys
Jul 24, 2023
github-actions
bot
added
the
pr-refactor
this PR changes the code base without new features or bugfix
label
Jul 24, 2023
SkyFan2002
reviewed
Jul 24, 2023
BohuTANG
reviewed
Jul 24, 2023
...ages/fuse/src/operations/replace_into/processors/transform_merge_into_mutation_aggregator.rs
Outdated
Show resolved
Hide resolved
SkyFan2002
reviewed
Jul 24, 2023
src/query/storages/fuse/src/operations/replace_into/mutator/mutator_replace_into.rs
Outdated
Show resolved
Hide resolved
BohuTANG
approved these changes
Jul 24, 2023
zhyass
approved these changes
Jul 24, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SkyFan2002
added a commit
to SkyFan2002/databend
that referenced
this pull request
Jul 26, 2023
BohuTANG
pushed a commit
that referenced
this pull request
Aug 10, 2023
* refactor copy into * fix panic * fix * fix * fix * make lint * fix logic error * replace into values * fix * fix * fix render result * fix schema cast * temp * respect #12147 * respect #12100 * make lint * respect #12130 * fix merge * add exchange * fix conflict * fix schema cast * fix conlfict * fix * fix copy plan * clear log * fix copy * fix copy * run ci * fix purge * make lint * add exchange * disable dist for value source * adjust exchange * remove top exchange * adjust replace into * reshuffle * fix * fix reshuffle * move segment_partition_num * resolve conflicts * add need insert flag * unbranched_replace_into_processor * merge only pipeline * fix segment index * fix conflict * remove log * fix empty table * fix stateful test * fix stateful test * modify test * fix typo * fix random source * add setting * remove empty file * remove dead code * add default setting * Update src/query/service/src/interpreters/interpreter_replace.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/sql/src/executor/physical_plan_display.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/storages/fuse/src/operations/replace_into/processors/processor_unbranched_replace_into.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/sql/src/executor/physical_plan.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/sql/src/executor/physical_plan_display.rs Co-authored-by: dantengsky <[email protected]> * rename struct * default 0 * regen golden file * set enable_distributed_replace_into = 1 in slt * make lint --------- Co-authored-by: dantengsky <[email protected]> Co-authored-by: JackTan25 <[email protected]>
andylokandy
pushed a commit
to andylokandy/databend
that referenced
this pull request
Nov 27, 2023
…tabendlabs#12147) * refactor: refine replace into pruning * parition rows (WIP) * partition by left most cluster key * more metric * add new setting enable_replace_into_partitioning * refine merge_into_mutator * only un-compact the segment info when necessary * minor gc * chore * adjust metric * fix typos
andylokandy
pushed a commit
to andylokandy/databend
that referenced
this pull request
Nov 27, 2023
…2119) * refactor copy into * fix panic * fix * fix * fix * make lint * fix logic error * replace into values * fix * fix * fix render result * fix schema cast * temp * respect databendlabs#12147 * respect databendlabs#12100 * make lint * respect databendlabs#12130 * fix merge * add exchange * fix conflict * fix schema cast * fix conlfict * fix * fix copy plan * clear log * fix copy * fix copy * run ci * fix purge * make lint * add exchange * disable dist for value source * adjust exchange * remove top exchange * adjust replace into * reshuffle * fix * fix reshuffle * move segment_partition_num * resolve conflicts * add need insert flag * unbranched_replace_into_processor * merge only pipeline * fix segment index * fix conflict * remove log * fix empty table * fix stateful test * fix stateful test * modify test * fix typo * fix random source * add setting * remove empty file * remove dead code * add default setting * Update src/query/service/src/interpreters/interpreter_replace.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/sql/src/executor/physical_plan_display.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/storages/fuse/src/operations/replace_into/processors/processor_unbranched_replace_into.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/sql/src/executor/physical_plan.rs Co-authored-by: dantengsky <[email protected]> * Update src/query/sql/src/executor/physical_plan_display.rs Co-authored-by: dantengsky <[email protected]> * rename struct * default 0 * regen golden file * set enable_distributed_replace_into = 1 in slt * make lint --------- Co-authored-by: dantengsky <[email protected]> Co-authored-by: JackTan25 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
enable table level pruning
rows of input data blocks that definitely not conflicts with table data, will be filtered out from being checked for conflicts.
partition input data by the leftmost cluster key
if the table defines cluster keys, the input data (after table-level pruning) will be partitioned by the left-most cluster key expression, in the later phase, tighter bounds will be used in the range prunings. hopefully, fewer data blocks will need to be loaded.
new setting
enable_replace_into_partitioning
set it to 0 will disable the partitioning of input data, in the execution of the replace-into statement, for the table has cluster keys.
new metrics
replace_into_original_row_number
replace_into_row_number_after_table_level_pruning
replace_into_partition_number
this PR has also been tested using https://github.com/dantengsky/rr (ec2 + gp2)
demo scenario
@SkyFan2002
hope this PR will not conflict too much with #12119, if anything needs to be adjusted, please let me know.