-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cell_based_table): add sentinel column and cell_based muti_get #1590
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1590 +/- ##
=======================================
Coverage 71.37% 71.38%
=======================================
Files 600 600
Lines 77726 77721 -5
=======================================
+ Hits 55477 55479 +2
+ Misses 22249 22242 -7
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
.concat(); | ||
let state_store_get_res = self.keyspace.get(&key.clone(), epoch).await?; | ||
if let Some(state_store_get_res) = state_store_get_res { | ||
get_res.push((key.clone(), state_store_get_res)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls check all unnecessary clones
@@ -32,7 +32,7 @@ use crate::util::sort_util::{OrderPair, OrderType}; | |||
use crate::util::value_encoding::serialize_cell; | |||
|
|||
/// The special `cell_id` reserved for a whole null row is `i32::MIN`. | |||
pub const NULL_ROW_SPECIAL_CELL_ID: ColumnId = ColumnId::new(i32::MIN); | |||
pub const SENTINEL_CELL_ID: ColumnId = ColumnId::new(-1_i32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, -1
is because we want SENTINEL_CELL_ID
to be at the very beginning when we do scan
? If so, may add some comments to it.
May give some examples in the comments, e.g. normal row, all-none row, what the storage format would be like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we are encoding the column id now. If simply using be or le encoding, -1 might not be at the beginning when scanning.
result.push((key, None)); | ||
} | ||
} | ||
} | ||
if all_null { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May check whether the comments for this whole function is obsolete.
@@ -32,7 +32,7 @@ use crate::util::sort_util::{OrderPair, OrderType}; | |||
use crate::util::value_encoding::serialize_cell; | |||
|
|||
/// The special `cell_id` reserved for a whole null row is `i32::MIN`. | |||
pub const NULL_ROW_SPECIAL_CELL_ID: ColumnId = ColumnId::new(i32::MIN); | |||
pub const SENTINEL_CELL_ID: ColumnId = ColumnId::new(-1_i32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we are encoding the column id now. If simply using be or le encoding, -1 might not be at the beginning when scanning.
@@ -298,7 +298,7 @@ impl<S: StateStore, const TOP_N_TYPE: usize> ManagedTopNState<S, TOP_N_TYPE> { | |||
let pk_row_bytes = self | |||
.keyspace | |||
.scan_strip_prefix( | |||
number_rows.map(|top_n_count| top_n_count * self.data_types.len()), | |||
number_rows.map(|top_n_count| top_n_count * (self.data_types.len() + 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So top-n will possibly scan more cells from storage? If there're nulls in storage, it will scan more that it needs. Hmm... better to have this refactored in the future. For example, use iterator-based APIs. (cc @lmatz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I talked to @wcy-fdu about this part requiring refactoring in the future.
@@ -235,7 +235,7 @@ mod tests { | |||
.await | |||
.unwrap() | |||
.len(), | |||
6 | |||
9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use get_row
API to refactor this test case now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also update comments above
#[tokio::test] | ||
async fn test_get_row_by_muti_get() { | ||
let state_store = MemoryStateStore::new(); | ||
let column_ids = vec![ColumnId::from(0), ColumnId::from(1), ColumnId::from(2)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also have some test cases like:
- variable-length type? e.g. string
- shuffled column id? e.g., column id 3, 2, 1 instead of 1, 2, 3
- non-continuous ids? e.g., 1, 4, 8 instead of 1, 2, 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will add these cases and remove get_for_test
in cell_based_table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Better also have a test case like write using 1, 2, 3
column id, and read using 3, 2
.
Good idea. |
@@ -32,7 +32,7 @@ use crate::util::sort_util::{OrderPair, OrderType}; | |||
use crate::util::value_encoding::serialize_cell; | |||
|
|||
/// The special `cell_id` reserved for a whole null row is `i32::MIN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this doc as well. :)
.map_err(err)?; | ||
assert!(deserialize_res.is_none()); | ||
} | ||
let pk_and_row = cell_based_row_deserializer.take(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we can unwrap
here?
assert_eq!( | ||
memory_state_store | ||
.scan::<_, Vec<u8>>(.., None, u64::MAX) | ||
.await | ||
.unwrap() | ||
.len(), | ||
6 | ||
9 | ||
); | ||
|
||
// FIXME(Bugen): restore this test by using new `RowTable` interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we can restore these tests with get_row
.
What's changed and what's your intention?
This PR first adds sentinel column when serialize/deserialize, using None value for sentinel cell. While state_store is getting sentinel column, there are three cases:
return None
: no exist rowreturn Some(value)
: exist rowThen cell_based
muti_get
is implemented by the design above.BTW, some unit tests have been modified due to corresponding changes.
Checklist
Refer to a related PR or issue link (optional)