[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246

decster · 2023-10-20T05:54:15Z

When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs.

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

starrocks-cr · 2023-10-20T05:55:23Z

be/src/storage/primary_key_encoder.cpp

+            *pcolumn = std::make_unique<LargeBinaryColumn>();
+        } else {
+            *pcolumn = std::make_unique<BinaryColumn>();
+        }
    }
    return Status::OK();
 }


This code seems to be good overall, but there are few potential improvements that could be made:

Code Redundancy: The condition on whether large_column is true of false is repeated in your code for both TYPE_VARCHAR and some other column type(s). You could extract the assignment into a separate function or method to reduce redundancy.

Function Documentation/Comments: There are no comments or documentations for the modified functions. It's important to explain what each parameter does and what the function returns. This helps others who read your code understand it more quickly and thoroughly.

Error Handling: In the function 'create_column', it's unclear what happens if the type is not one of the pre-specified ones. Having a default case or an additional return status for unsupported type might be useful.

Variable Naming: Although this is subjective, bool variables often are more readable if named as a question or statement of state. e.g., is_large_column might be clearer than large_column.

Magic Constants: The constants like TYPE_INT128, TYPE_VARCHAR, etc. - it would be great if they're documented at the place where they are declared so developers can know what these values mean and what they're used for.

Unit Testing: Ensure that these changes are adequately tested, particularly around handling of large_columns.

Here's an example refactor addressing point 1:

std::unique_ptr<Column> define_type(bool large_column){ if (large_column) { return std::make_unique<LargeBinaryColumn>(); } else { return std::make_unique<BinaryColumn>(); } } //In the switch-case and the else block: *pcolumn = define_type(large_column);

Remember that code review suggestions can vary between reviewers based on their experiences, coding styles, and the specific requirements of the project.

starrocks-cr · 2023-10-20T05:55:28Z

be/src/storage/primary_key_encoder.h

    static Status create_column(const Schema& schema, std::unique_ptr<Column>* pcolumn,
-                                const std::vector<ColumnId>& key_idxes);
+                                const std::vector<ColumnId>& key_idxes, bool large_column = false);

    static void encode(const Schema& schema, const Chunk& chunk, size_t offset, size_t len, Column* dest);



The code you've provided seems well-structured and follows decent practices. However, there are a few suggestions that might improve its readability, maintenance, and potential for future changes. Here are a few observations:

Consistent documentation: While the addition of comments is excellent, it could be made neater if the documentation style were consistent. Right now, comments describe inputs inline but use new lines for the static method descriptions. You might also consider using a documentation standard like Doxygen or Javadoc to make comments more structured, clear, and usable.

Overloads of create_column: There are two versions of create_column having different parameters. If the functionality is mostly the same between them, consider refactoring your code to remove extra duplication while preserving the flexibility provided by the overloading. You also need to ensure that both methods behave in a consistent way for the same initial set of parameters, to prevent bugs and confusion.

Boolean trap: In the create_column function, the last parameter is a boolean. This can lead to "boolean trap" where the meaning of true/false is unclear when reading the code where this method is used. An enum could be used instead with clearly named values to denote what this flag represents.

Error checking: There's no clear sign of how potential errors are being managed. It would be handy to have some error-checking procedures, at least with critical parts. For instance, before creating the column, it might be useful to verify whether the given schema is valid or not.

Usage of std::unique_ptr<Column>* pcolumn: Passing a smart pointer by reference might be preferable than passing by pointer. This signifies that this argument can be modified.

Remember, these points are meta-suggestions based on the part of the code you have shared. The actual improvements might be different based on complete context and actual implementation of these methods. Further assessment should also consider performance and testing behaviors.

Signed-off-by: Binglin Chang <[email protected]>

wanpengfei-git · 2023-10-23T08:47:23Z

[FE Incremental Coverage Report]

😍 pass : 0 / 0 (0%)

wanpengfei-git · 2023-10-23T09:10:30Z

[BE Incremental Coverage Report]

😞 fail : 98 / 166 (59.04%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	be/src/storage/update_compaction_state.cpp	1	7	14.29%	[130, 131, 132, 138, 139, 140]
🔵	be/src/storage/primary_key_encoder.cpp	89	151	58.94%	[516, 517, 518, 519, 520, 521, 522, 525, 528, 529, 530, 531, 532, 534, 535, 538, 571, 572, 573, 574, 575, 576, 577, 578, 580, 652, 653, 654, 655, 656, 657, 658, 664, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 719, 720]
🔵	be/src/storage/persistent_index.cpp	2	2	100.00%	[]
🔵	be/src/storage/lake/rowset_update_state.cpp	1	1	100.00%	[]
🔵	be/src/storage/lake/update_compaction_state.cpp	1	1	100.00%	[]
🔵	be/src/storage/primary_index.cpp	2	2	100.00%	[]
🔵	be/src/column/binary_column.cpp	1	1	100.00%	[]
🔵	be/src/storage/rowset_update_state.cpp	1	1	100.00%	[]

github-actions · 2023-10-24T02:21:18Z

@Mergifyio backport branch-3.2

github-actions · 2023-10-24T02:21:20Z

@Mergifyio backport branch-3.1

github-actions · 2023-10-24T02:21:21Z

@Mergifyio backport branch-3.0

github-actions · 2023-10-24T02:21:23Z

@Mergifyio backport branch-2.5

mergify · 2023-10-24T02:21:36Z

backport branch-3.2

✅ Backports have been created

#33520 [BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) has been created for branch branch-3.2

mergify · 2023-10-24T02:21:39Z

backport branch-3.1

✅ Backports have been created

#33521 [BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) has been created for branch branch-3.1 but encountered conflicts

mergify · 2023-10-24T02:21:41Z

backport branch-3.0

✅ Backports have been created

#33522 [BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) has been created for branch branch-3.0 but encountered conflicts

mergify · 2023-10-24T02:21:43Z

backport branch-2.5

✅ Backports have been created

#33523 [BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) has been created for branch branch-2.5 but encountered conflicts

…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)

…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/update_compaction_state.cpp

…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/rowset_update_state.cpp # be/src/storage/lake/update_compaction_state.cpp # be/src/storage/primary_key_encoder.cpp # be/src/storage/primary_key_encoder.h # be/src/storage/rowset_update_state.cpp # be/src/storage/update_compaction_state.cpp

…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)

luohaha · 2023-11-28T21:05:55Z

https://github.com/Mergifyio backport branch-3.1-cs

mergify · 2023-11-28T21:06:22Z

backport branch-3.1-cs

✅ Backports have been created

#36005 [BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) has been created for branch branch-3.1-cs but encountered conflicts

…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/update_compaction_state.cpp

…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) (cherry picked from commit 707c816)

github-actions bot added 3.2 3.1 labels Oct 20, 2023

mergify bot assigned decster Oct 20, 2023

github-actions bot added 3.0 2.5 labels Oct 20, 2023

starrocks-cr bot reviewed Oct 20, 2023

View reviewed changes

decster force-pushed the bugfix-compaction-state-overflow branch from 2f9786a to 2ff0c20 Compare October 20, 2023 08:08

[Bugfix] Fix column overflow in RowsetUpdateState and CompactionState

c3a9605

Signed-off-by: Binglin Chang <[email protected]>

decster force-pushed the bugfix-compaction-state-overflow branch 2 times, most recently from 1fae91c to da81594 Compare October 23, 2023 05:12

add more log for debug

9082438

Signed-off-by: Binglin Chang <[email protected]>

decster force-pushed the bugfix-compaction-state-overflow branch from da81594 to 9082438 Compare October 23, 2023 06:55

chaoyli approved these changes Oct 24, 2023

View reviewed changes

luohaha approved these changes Oct 24, 2023

View reviewed changes

sevev approved these changes Oct 24, 2023

View reviewed changes

decster merged commit 4a2c85a into StarRocks:main Oct 24, 2023
43 of 44 checks passed

github-actions bot removed the 3.2 label Oct 24, 2023

github-actions bot removed 3.1 3.0 labels Oct 24, 2023

github-actions bot removed the 2.5 label Oct 24, 2023

mergify bot mentioned this pull request Oct 24, 2023

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) #33520

Merged

This was referenced Oct 24, 2023

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) #33521

Merged

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) #33522

Merged

mergify bot mentioned this pull request Oct 24, 2023

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) #33523

Merged

mergify bot mentioned this pull request Nov 28, 2023

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState (backport #33246) #36005

Closed

wanpengfei-git added the backport_ok label Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246

decster commented Oct 20, 2023 •

edited by chaoyli

Loading

starrocks-cr bot Oct 20, 2023

starrocks-cr bot Oct 20, 2023

wanpengfei-git commented Oct 23, 2023

wanpengfei-git commented Oct 23, 2023

github-actions bot commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

luohaha commented Nov 28, 2023

mergify bot commented Nov 28, 2023 •

edited

Loading

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246

[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246

Conversation

decster commented Oct 20, 2023 • edited by chaoyli Loading

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

starrocks-cr bot Oct 20, 2023

Choose a reason for hiding this comment

starrocks-cr bot Oct 20, 2023

Choose a reason for hiding this comment

wanpengfei-git commented Oct 23, 2023

[FE Incremental Coverage Report]

wanpengfei-git commented Oct 23, 2023

[BE Incremental Coverage Report]

file detail

github-actions bot commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

mergify bot commented Oct 24, 2023 • edited Loading

✅ Backports have been created

mergify bot commented Oct 24, 2023 • edited Loading

✅ Backports have been created

mergify bot commented Oct 24, 2023 • edited Loading

✅ Backports have been created

mergify bot commented Oct 24, 2023 • edited Loading

✅ Backports have been created

luohaha commented Nov 28, 2023

mergify bot commented Nov 28, 2023 • edited Loading

✅ Backports have been created

decster commented Oct 20, 2023 •

edited by chaoyli

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Oct 24, 2023 •

edited

Loading

mergify bot commented Nov 28, 2023 •

edited

Loading