-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] fix column overflow when handle too large partial update #49054
Conversation
Signed-off-by: luohaha <[email protected]>
Signed-off-by: luohaha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud native table also need this optimization?
Signed-off-by: luohaha <[email protected]>
Quality Gate passedIssues Measures |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 44 / 48 (91.67%) file detail
|
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
✅ Backports have been created
|
✅ Backports have been created
|
…9054) Signed-off-by: luohaha <[email protected]> (cherry picked from commit 3b682c7) # Conflicts: # gensrc/proto/olap_file.proto
…9054) Signed-off-by: luohaha <[email protected]> (cherry picked from commit 3b682c7) # Conflicts: # be/src/storage/rowset/rowset.h # be/src/storage/rowset_column_update_state.cpp # gensrc/proto/olap_file.proto
…ckport #49054) (#49478) Signed-off-by: Yixin Luo <[email protected]> Co-authored-by: Yixin Luo <[email protected]>
…ckport #49054) (#49477) Signed-off-by: Yixin Luo <[email protected]> Co-authored-by: Yixin Luo <[email protected]>
Why I'm doing:
In current implementation, when handle partial column update, we will try to build the column to be updated with segment granularity, it will lead to overflow. E.g.
ARRAY
column to be update, In the beginning, because the data in the table does not contain arrays yet, we will slice the segment file with a size of 1GB, so a segment file may contain a large number of rows. We can assume that there are 500w rows in a tablet.ArraryColumn
struct, we store offset of array using uint32_t , that means we can only have 4,294,967,295 items in aArraryColumn
.What I'm doing:
Processing updates involving large amounts of data in batches, each batch will be limit by
partial_update_memory_limit_per_worker
.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: