-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246
[BugFix] Fix column overflow in RowsetUpdateState and CompactionState #33246
Conversation
*pcolumn = std::make_unique<LargeBinaryColumn>(); | ||
} else { | ||
*pcolumn = std::make_unique<BinaryColumn>(); | ||
} | ||
} | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems to be good overall, but there are few potential improvements that could be made:
-
Code Redundancy: The condition on whether
large_column
is true of false is repeated in your code for both TYPE_VARCHAR and some other column type(s). You could extract the assignment into a separate function or method to reduce redundancy. -
Function Documentation/Comments: There are no comments or documentations for the modified functions. It's important to explain what each parameter does and what the function returns. This helps others who read your code understand it more quickly and thoroughly.
-
Error Handling: In the function 'create_column', it's unclear what happens if the
type
is not one of the pre-specified ones. Having a default case or an additional return status for unsupportedtype
might be useful. -
Variable Naming: Although this is subjective, bool variables often are more readable if named as a question or statement of state. e.g.,
is_large_column
might be clearer thanlarge_column
. -
Magic Constants: The constants like
TYPE_INT128
,TYPE_VARCHAR
, etc. - it would be great if they're documented at the place where they are declared so developers can know what these values mean and what they're used for. -
Unit Testing: Ensure that these changes are adequately tested, particularly around handling of large_columns.
Here's an example refactor addressing point 1:
std::unique_ptr<Column> define_type(bool large_column){
if (large_column) {
return std::make_unique<LargeBinaryColumn>();
} else {
return std::make_unique<BinaryColumn>();
}
}
//In the switch-case and the else block:
*pcolumn = define_type(large_column);
Remember that code review suggestions can vary between reviewers based on their experiences, coding styles, and the specific requirements of the project.
static Status create_column(const Schema& schema, std::unique_ptr<Column>* pcolumn, | ||
const std::vector<ColumnId>& key_idxes); | ||
const std::vector<ColumnId>& key_idxes, bool large_column = false); | ||
|
||
static void encode(const Schema& schema, const Chunk& chunk, size_t offset, size_t len, Column* dest); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code you've provided seems well-structured and follows decent practices. However, there are a few suggestions that might improve its readability, maintenance, and potential for future changes. Here are a few observations:
-
Consistent documentation: While the addition of comments is excellent, it could be made neater if the documentation style were consistent. Right now, comments describe inputs inline but use new lines for the static method descriptions. You might also consider using a documentation standard like Doxygen or Javadoc to make comments more structured, clear, and usable.
-
Overloads of
create_column
: There are two versions ofcreate_column
having different parameters. If the functionality is mostly the same between them, consider refactoring your code to remove extra duplication while preserving the flexibility provided by the overloading. You also need to ensure that both methods behave in a consistent way for the same initial set of parameters, to prevent bugs and confusion. -
Boolean trap: In the
create_column
function, the last parameter is a boolean. This can lead to "boolean trap" where the meaning of true/false is unclear when reading the code where this method is used. An enum could be used instead with clearly named values to denote what this flag represents. -
Error checking: There's no clear sign of how potential errors are being managed. It would be handy to have some error-checking procedures, at least with critical parts. For instance, before creating the column, it might be useful to verify whether the given schema is valid or not.
-
Usage of
std::unique_ptr<Column>* pcolumn
: Passing a smart pointer by reference might be preferable than passing by pointer. This signifies that this argument can be modified.
Remember, these points are meta-suggestions based on the part of the code you have shared. The actual improvements might be different based on complete context and actual implementation of these methods. Further assessment should also consider performance and testing behaviors.
2f9786a
to
2ff0c20
Compare
Signed-off-by: Binglin Chang <[email protected]>
1fae91c
to
da81594
Compare
Signed-off-by: Binglin Chang <[email protected]>
da81594
to
9082438
Compare
[FE Incremental Coverage Report]😍 pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]😞 fail : 98 / 166 (59.04%) file detail
|
@Mergifyio backport branch-3.2 |
@Mergifyio backport branch-3.1 |
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-2.5 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/update_compaction_state.cpp
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/update_compaction_state.cpp
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/rowset_update_state.cpp # be/src/storage/lake/update_compaction_state.cpp # be/src/storage/primary_key_encoder.cpp # be/src/storage/primary_key_encoder.h # be/src/storage/rowset_update_state.cpp # be/src/storage/update_compaction_state.cpp
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a)
https://github.com/Mergifyio backport branch-3.1-cs |
✅ Backports have been created
|
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) # Conflicts: # be/src/storage/lake/update_compaction_state.cpp
…#33246) When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing BinaryColumn<uint32> to BinaryColumn<int64>, adding some checks for overflow, and refactoring some compaction memory logs. Fixes #33247 Signed-off-by: Binglin Chang <[email protected]> (cherry picked from commit 4a2c85a) (cherry picked from commit 707c816)
Fixes #33247
When the compaction output file is too large and contains many rows, the resulting buffer containing all encoded pk in this segment file may be huge and overflow. This PR fixes this by changing
BinaryColumn<uint32>
toBinaryColumn<int64>
, adding some checks for overflow, and refactoring some compaction memory logs.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: