-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Limit the number of partitions initially opened by expression partition ingestion #47976
Conversation
825afcb
to
5e4d5d3
Compare
1d0f825
to
a50d1c7
Compare
…ession partition ingestion Signed-off-by: meegoo <[email protected]>
a50d1c7
to
eccc6d7
Compare
Quality Gate passedIssues Measures |
[FE Incremental Coverage Report]✅ pass : 10 / 11 (90.91%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
need backport to 3.3 & 3.2 & 3.1 |
* Used to limit num of partition for load open partition number | ||
*/ | ||
@ConfField(mutable = true) | ||
public static long max_load_initial_open_partition_number = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should some partitions initially opened for data loading in an expression partitioned table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most scenarios, data is often only loaded into the nearest partition. Opening it in advance can avoid additional RPC requests at runtime.
@mergify backport branch-3.3 |
✅ Backports have been created
|
…ession partition ingestion (#47976) Signed-off-by: meegoo <[email protected]> (cherry picked from commit 92e29e1)
…ession partition ingestion (backport #47976) (#52619) Co-authored-by: meegoo <[email protected]>
Why I'm doing:
Currently, when we ingest data, unless a partition is specified, information for all partitions is sent, causing the storage side to open delta writers for all partitions. When there are many partitions, this can consume a large amount of unnecessary memory, even though only a few partitions typically have data written to them.
What I'm doing:
This optimization controls the number of partitions sent, allowing the storage side to initially open fewer delta writers. Subsequently, using a mechanism similar to expression partitioning, it dynamically opens the required partitions based on the data during runtime. This approach can significantly reduce memory consumption.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: