-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: apply relational refactor for hash agg (max, min) #2999
feat: apply relational refactor for hash agg (max, min) #2999
Conversation
c0a5902
to
2a7e515
Compare
4894a77
to
e37abc6
Compare
Codecov Report
@@ Coverage Diff @@
## main #2999 +/- ##
==========================================
+ Coverage 73.15% 73.23% +0.07%
==========================================
Files 756 748 -8
Lines 102521 101766 -755
==========================================
- Hits 75003 74525 -478
+ Misses 27518 27241 -277
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
4b43658
to
5c476f8
Compare
5c476f8
to
e165024
Compare
src/storage/src/table/state_table.rs
Outdated
@@ -204,6 +204,10 @@ impl<S: StateStore> StateTable<S> { | |||
self.iter_with_encoded_key_bounds(encoded_key_bounds, epoch) | |||
.await | |||
} | |||
|
|||
pub fn is_dirty(&self) -> bool { | |||
!self.mem_table.buffer.is_empty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we can remove get_mem_table()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. NTFS
There are a lot to be refactored/removed. Tracked in: #3235 |
@@ -317,42 +357,26 @@ where | |||
} | |||
|
|||
/// Flush the internal state to a write batch. | |||
fn flush_inner(&mut self, write_batch: &mut WriteBatch<S>) -> StreamExecutorResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flush now becomes very clean. That's why I think maybe in future we should remove flush
on ManagedState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO
here? The original doc seems confusing now. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM!
0bc7a0e
to
d8b15d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
@@ -124,23 +130,22 @@ where | |||
/// always be retained when flushing the managed state. Otherwise, we will only retain n entries | |||
/// after each flush. | |||
pub async fn new( | |||
keyspace: Keyspace<S>, | |||
_keyspace: Keyspace<S>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may remove the generic for the struct and add it to every functions accepting the state table. So that there's no need for phantom data as well.
.await?; | ||
pin_mut!(all_data_iter); | ||
|
||
for _ in 0..self.top_n_count.unwrap_or(usize::MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If top_n_count
is None
, we should keep nothing in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe It should not be a Option... Let's change in other PRs
@@ -317,42 +357,26 @@ where | |||
} | |||
|
|||
/// Flush the internal state to a write batch. | |||
fn flush_inner(&mut self, write_batch: &mut WriteBatch<S>) -> StreamExecutorResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO
here? The original doc seems confusing now. 😄
/// TODO: Remove this soon. | ||
serializer: ExtremeSerializer<A::OwnedItem, EXTREME_TYPE>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's store the pk length and it seems we can remove this now. Then the whole module can be cleaned up as well.
@@ -279,23 +280,29 @@ impl<K: HashKey, S: StateStore> HashAggExecutor<K, S> { | |||
input_pk_data_types.clone(), | |||
epoch, | |||
Some(hash_code), | |||
state_tables, | |||
&*state_tables.read().await, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a RwLock
, it seems the fetch process cannot be concurrent if some task is applying the chunk with a write guard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may move the apply part outside of the future, or introduce a new rwlock-wrapped state table so that there's no need to pass state tables as arguments everywhere. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Let's hold this PR for a while, I'll need more time to review it. |
I guess the main problem may comes from the API. Should we pass &StateTable as parameters or Store it as ArcMutex directly in state? Let me brief some points:
I choose this kind of design cuz I read this RFC: https://singularity-data.quip.com/buN6ASPdobmk/RFC-Split-Dirty-State-and-Cache. And we previously pass WriteBatch. But I found the refactor cost is huge: A lot of trait methods are affected. So I' m not so strong on this and looking for suggestion. Overall this two is same in performance theory, only differentiate in API. |
d8b15d2
to
9e48c04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold this PR until CI pass.
7075aff
to
f2e4773
Compare
f2e4773
to
c052e9c
Compare
c052e9c
to
d38f25a
Compare
Any update 🥰? There are some small tweaks that might be done in this PR, if no big change. |
commit 309ce36 Author: Bowen <[email protected]> Date: Tue Jun 21 13:19:44 2022 +0800 refactor(agg): clean up unused fields & refactor (#3339) * refactor(agg): clean up unused fields * delete file Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit e48dced Author: Shmiwy <[email protected]> Date: Tue Jun 21 12:55:21 2022 +0800 feat(storage): support compression setting per level (#3362) Signed-off-by: Shmiwy <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 961936f Author: Alex Chi <[email protected]> Date: Tue Jun 21 12:42:45 2022 +0800 feat(test): parallelize sqlsmith test (#3360) * feat(test): parallelize sqlsmith test Signed-off-by: Alex Chi <[email protected]> * more tests Signed-off-by: Alex Chi <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit fa90541 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 12:25:33 2022 +0800 chore(github): feature-request template should use the feature label (#3359) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 02405e1 Author: Bowen <[email protected]> Date: Tue Jun 21 12:13:07 2022 +0800 style: add more comments & refactor on pg-wire (#3358) style: add more comments on pg-wire code Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 1c09432 Author: Li0k <[email protected]> Date: Tue Jun 21 12:00:37 2022 +0800 fix(storage): fix slow unit-test in compactor_test (#3357) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 19036a6 Author: xxchan <[email protected]> Date: Tue Jun 21 05:48:08 2022 +0200 fix(binder): do not allow correlated subquery in join tables (#3352) * fix(binder): do not allow correlated subquery in join tables * clippy Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit c37c9c4 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 11:24:40 2022 +0800 refactor: remove unnecessary lazy_static (#3353) Signed-off-by: TennyZhuang <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 68596ab Author: Croxx <[email protected]> Date: Tue Jun 21 11:11:53 2022 +0800 feat(cache): introduce LruCacheEventListener to subscribe erasure and eviction (#3334) commit 5627f25 Author: Steven Chua <[email protected]> Date: Tue Jun 21 10:54:12 2022 +0800 feat(ctl): Display SstableIdInfo and Block Metadata in sst-dump (#3338) * feat(ctl): Add sst-dump command to risectl * feat(ctl): Fix risectl compatibility and remove VNode info * feat(ctl): Add checksum and compression algo for each block * feat(ctl): Add SstableIdInfo data to sst-dump * feat(ctl): Fix compilation errors * feat(ctl): Fix compilation errors and bugs commit 16ffd98 Author: Name1e5s <[email protected]> Date: Tue Jun 21 10:50:33 2022 +0800 fix(expr): cast int16/int32/int64/float32 to float64 in floor/ceil/round (#3319) * fix(expr): cast int16/int32/int64/float32 to float64 in floor/ceil/round * fix plan Co-authored-by: TennyZhuang <[email protected]> commit 964bb92 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 10:23:56 2022 +0800 ci(Mergify): configuration update (#3355) Signed-off-by: null <[email protected]> commit dac904e Author: jon-chuang <[email protected]> Date: Tue Jun 21 09:57:33 2022 +0800 feat(executor): streaming hyperloglog improvements (#3315) * minor * rename tests * minor * remove option, const eval of param, better comments, succint tests commit cd4f302 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 09:34:54 2022 +0800 ci(Mergify): configuration update (#3252) * ci(Mergify): configuration update Signed-off-by: null <[email protected]> * Update .mergify.yml * Update .mergify.yml * Update .mergify.yml Co-authored-by: xxchan <[email protected]> commit 2aa7e8e Author: Xinpeng Wei <[email protected]> Date: Mon Jun 20 22:11:51 2022 +0800 feat(frontend): add InternalStateTable Catalog (#3139) * use TableMessage for internal table * fix risedev check * update planner test * fix unit test * fix misc check * fix ci * fix issues in PR comments * fix clippy * fix ci * update planner test commit 1e190dd Author: xxchan <[email protected]> Date: Mon Jun 20 14:16:12 2022 +0200 fix(binder): do not allow correlated input ref in order by (#3346) commit 263d770 Author: Tao Wu <[email protected]> Date: Mon Jun 20 18:54:25 2022 +0800 fix: build failure caused by OptimzierContext::new (#3340) commit 9f18401 Author: Tao Wu <[email protected]> Date: Mon Jun 20 17:47:22 2022 +0800 feat: introduce the framework of sqlsmith (#3305) commit 096a991 Author: Alex Chi <[email protected]> Date: Mon Jun 20 17:40:05 2022 +0800 feat(ctl): add bench command (#3337) Signed-off-by: Alex Chi <[email protected]> commit 075d596 Author: TennyZhuang <[email protected]> Date: Mon Jun 20 17:05:50 2022 +0800 build: bump toolchain to 20220620 (#3324) * build: bump toolchain to 20220620 Signed-off-by: TennyZhuang <[email protected]> * also update docker-compose Signed-off-by: TennyZhuang <[email protected]> commit d864f30 Author: Wenzhuo Liu <[email protected]> Date: Mon Jun 20 16:38:21 2022 +0800 feat: add output_indices to join executors (#3047) commit 13b9d58 Author: StrikeW <[email protected]> Date: Mon Jun 20 16:29:56 2022 +0800 feat(stream): enable append-only mv plan for kafka source (#3333) commit ea386d3 Author: Liang <[email protected]> Date: Mon Jun 20 15:50:55 2022 +0800 refactor(compaction): deprecate the HashStrategy for OverlapStrategy (#3331) commit a9fba38 Author: Liang <[email protected]> Date: Mon Jun 20 15:34:25 2022 +0800 fix(picker): fetch info from table_id field in sstableinfo (#3332) commit 5d2bb42 Author: Bohan Zhang <[email protected]> Date: Mon Jun 20 14:55:59 2022 +0800 test(stream): add ci for split change mutation in source (#3039) * stage Signed-off-by: tabVersion <[email protected]> * stage Signed-off-by: tabVersion <[email protected]> * add test Signed-off-by: tabVersion <[email protected]> * change e2e to datagen Signed-off-by: tabVersion <[email protected]> * stage Signed-off-by: tabVersion <[email protected]> * some bug to fix Signed-off-by: tabVersion <[email protected]> * fix async issue Signed-off-by: tabVersion <[email protected]> * add assert Signed-off-by: tabVersion <[email protected]> commit 04fe6d6 Author: Liang <[email protected]> Date: Mon Jun 20 14:47:56 2022 +0800 refactor(vnode bitmap): remove vnode bitmap in sst info (#3329) commit c6d1288 Author: Li0k <[email protected]> Date: Mon Jun 20 14:29:56 2022 +0800 feat(storage): add manual compaction picker for targeted compaction (#3288) * feat(storage): add ManualCompactionPicker * feat(storage): distinguish get_compaction_task for manual * feat(storage): meta client support more parameters for manual_compaction * chore(storage): add tracing and some notes * chore(storage): split manual_compaction_picker to independent file * feat(storage): fix target_input check pending and support manual_pick for dynamic_level_selector * fix(storage): internal_table_id include mv_id * fix(storage): fix picker check target_input_ssts pending * fix(storage): fix picker with total_file_size commit d04954f Author: Renjie Liu <[email protected]> Date: Mon Jun 20 14:27:01 2022 +0800 fix(ci): Reduce log (#3330) commit eca9239 Author: Bugen Zhao <[email protected]> Date: Mon Jun 20 13:59:56 2022 +0800 refactor(storage): remove `Option` on pk serializer of cell-based table (#3328) * minor refactor Signed-off-by: Bugen Zhao <[email protected]> * remove option of pk serializer Signed-off-by: Bugen Zhao <[email protected]> * remove pk serializer in state table Signed-off-by: Bugen Zhao <[email protected]> * extract vnode compute Signed-off-by: Bugen Zhao <[email protected]> * remove into order types Signed-off-by: Bugen Zhao <[email protected]> commit 9169436 Author: Bowen <[email protected]> Date: Mon Jun 20 13:44:10 2022 +0800 feat: apply relational refactor for hash agg (max, min) (#2999) * feat: two closure can not get mut ref of same variable * use Arc::Mutex to wrap the state table * roll back string agg * add StateTable to get_output * finish basic coding (unit test failed) * finish basic coding * fix bug * show case * use empty Row for scan * tweak commit 35bb16a Author: Bugen Zhao <[email protected]> Date: Mon Jun 20 13:43:11 2022 +0800 refactor: use packed bitmap struct for vnode bitmap (#3310) * use bitmap in streaming Signed-off-by: Bugen Zhao <[email protected]> * use bitmap in storage Signed-off-by: Bugen Zhao <[email protected]> * minor fix Signed-off-by: Bugen Zhao <[email protected]> * make bitmap optional Signed-off-by: Bugen Zhao <[email protected]> commit be50f93 Author: Liang <[email protected]> Date: Mon Jun 20 13:27:37 2022 +0800 feat(compaction): let compactor be unaware of vnode mapping (#3321) commit 88258a6 Author: lmatz <[email protected]> Date: Sun Jun 19 21:31:01 2022 -0700 doc: no need to manually check in PR from forks (#3325) commit c590e18 Author: zwang28 <[email protected]> Date: Mon Jun 20 12:13:50 2022 +0800 refactor(storage): split HummockVersion's levels by compaction group. (#3206) commit f1c3298 Author: zwang28 <[email protected]> Date: Mon Jun 20 11:35:18 2022 +0800 feat(meta): register source to compaction group manager (#3300) commit bf08b54 Author: Zack <[email protected]> Date: Mon Jun 20 11:25:39 2022 +0800 feat(frontend): Add sql string into context for debugging (#3312) * feat(frontend): Add sql string into context for debugging * Remove renaming * Refactor to use str commit f784ba3 Author: zwang28 <[email protected]> Date: Mon Jun 20 11:22:12 2022 +0800 feat(storage): shared buffer flush L0 by compaction group (#3200) commit bd48bba Author: Kexiang Wang <[email protected]> Date: Sun Jun 19 07:48:45 2022 -0400 feat: modify interfaces to support specifying parallelism for each fr… (#3283) feat: modify interfaces to support specifying parallelism for each fragment commit 8f0e0b2 Author: Steven Chua <[email protected]> Date: Sun Jun 19 12:56:51 2022 +0800 feat(ctl): Support basic sst dump in risectl (#3309) * feat(ctl): Add sst-dump command to risectl * feat(ctl): Fix risectl compatibility and remove VNode info commit daf9222 Author: Alex Chi <[email protected]> Date: Sat Jun 18 21:41:44 2022 +0800 feat(risedev): generate risectl config (#3318) * feat(risedev): generate risectl config Signed-off-by: Alex Chi <[email protected]> * fix Signed-off-by: Alex Chi <[email protected]> commit 86ff992 Author: Alex Chi <[email protected]> Date: Sat Jun 18 20:52:51 2022 +0800 feat(ctl): support table scan (#3317) * feat(ctl): support table scan Signed-off-by: Alex Chi <[email protected]> * license header Signed-off-by: Alex Chi <[email protected]> * add docs Signed-off-by: Alex Chi <[email protected]> commit 5ac5637 Author: Yikun Chen <[email protected]> Date: Sat Jun 18 08:03:06 2022 -0400 feat: support interval comparison (#3222) 1. fix timestamp substract timestamp. 2. support interval comparison. From pgsql, 1 month equal to 30 days and 1 day equal to 86400000 ms. Signed-off-by: Little-Wallace <[email protected]>
commit 309ce36 Author: Bowen <[email protected]> Date: Tue Jun 21 13:19:44 2022 +0800 refactor(agg): clean up unused fields & refactor (risingwavelabs#3339) * refactor(agg): clean up unused fields * delete file Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit e48dced Author: Shmiwy <[email protected]> Date: Tue Jun 21 12:55:21 2022 +0800 feat(storage): support compression setting per level (risingwavelabs#3362) Signed-off-by: Shmiwy <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 961936f Author: Alex Chi <[email protected]> Date: Tue Jun 21 12:42:45 2022 +0800 feat(test): parallelize sqlsmith test (risingwavelabs#3360) * feat(test): parallelize sqlsmith test Signed-off-by: Alex Chi <[email protected]> * more tests Signed-off-by: Alex Chi <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit fa90541 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 12:25:33 2022 +0800 chore(github): feature-request template should use the feature label (risingwavelabs#3359) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 02405e1 Author: Bowen <[email protected]> Date: Tue Jun 21 12:13:07 2022 +0800 style: add more comments & refactor on pg-wire (risingwavelabs#3358) style: add more comments on pg-wire code Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 1c09432 Author: Li0k <[email protected]> Date: Tue Jun 21 12:00:37 2022 +0800 fix(storage): fix slow unit-test in compactor_test (risingwavelabs#3357) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 19036a6 Author: xxchan <[email protected]> Date: Tue Jun 21 05:48:08 2022 +0200 fix(binder): do not allow correlated subquery in join tables (risingwavelabs#3352) * fix(binder): do not allow correlated subquery in join tables * clippy Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit c37c9c4 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 11:24:40 2022 +0800 refactor: remove unnecessary lazy_static (risingwavelabs#3353) Signed-off-by: TennyZhuang <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 68596ab Author: Croxx <[email protected]> Date: Tue Jun 21 11:11:53 2022 +0800 feat(cache): introduce LruCacheEventListener to subscribe erasure and eviction (risingwavelabs#3334) commit 5627f25 Author: Steven Chua <[email protected]> Date: Tue Jun 21 10:54:12 2022 +0800 feat(ctl): Display SstableIdInfo and Block Metadata in sst-dump (risingwavelabs#3338) * feat(ctl): Add sst-dump command to risectl * feat(ctl): Fix risectl compatibility and remove VNode info * feat(ctl): Add checksum and compression algo for each block * feat(ctl): Add SstableIdInfo data to sst-dump * feat(ctl): Fix compilation errors * feat(ctl): Fix compilation errors and bugs commit 16ffd98 Author: Name1e5s <[email protected]> Date: Tue Jun 21 10:50:33 2022 +0800 fix(expr): cast int16/int32/int64/float32 to float64 in floor/ceil/round (risingwavelabs#3319) * fix(expr): cast int16/int32/int64/float32 to float64 in floor/ceil/round * fix plan Co-authored-by: TennyZhuang <[email protected]> commit 964bb92 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 10:23:56 2022 +0800 ci(Mergify): configuration update (risingwavelabs#3355) Signed-off-by: null <[email protected]> commit dac904e Author: jon-chuang <[email protected]> Date: Tue Jun 21 09:57:33 2022 +0800 feat(executor): streaming hyperloglog improvements (risingwavelabs#3315) * minor * rename tests * minor * remove option, const eval of param, better comments, succint tests commit cd4f302 Author: TennyZhuang <[email protected]> Date: Tue Jun 21 09:34:54 2022 +0800 ci(Mergify): configuration update (risingwavelabs#3252) * ci(Mergify): configuration update Signed-off-by: null <[email protected]> * Update .mergify.yml * Update .mergify.yml * Update .mergify.yml Co-authored-by: xxchan <[email protected]> commit 2aa7e8e Author: Xinpeng Wei <[email protected]> Date: Mon Jun 20 22:11:51 2022 +0800 feat(frontend): add InternalStateTable Catalog (risingwavelabs#3139) * use TableMessage for internal table * fix risedev check * update planner test * fix unit test * fix misc check * fix ci * fix issues in PR comments * fix clippy * fix ci * update planner test commit 1e190dd Author: xxchan <[email protected]> Date: Mon Jun 20 14:16:12 2022 +0200 fix(binder): do not allow correlated input ref in order by (risingwavelabs#3346) commit 263d770 Author: Tao Wu <[email protected]> Date: Mon Jun 20 18:54:25 2022 +0800 fix: build failure caused by OptimzierContext::new (risingwavelabs#3340) commit 9f18401 Author: Tao Wu <[email protected]> Date: Mon Jun 20 17:47:22 2022 +0800 feat: introduce the framework of sqlsmith (risingwavelabs#3305) commit 096a991 Author: Alex Chi <[email protected]> Date: Mon Jun 20 17:40:05 2022 +0800 feat(ctl): add bench command (risingwavelabs#3337) Signed-off-by: Alex Chi <[email protected]> commit 075d596 Author: TennyZhuang <[email protected]> Date: Mon Jun 20 17:05:50 2022 +0800 build: bump toolchain to 20220620 (risingwavelabs#3324) * build: bump toolchain to 20220620 Signed-off-by: TennyZhuang <[email protected]> * also update docker-compose Signed-off-by: TennyZhuang <[email protected]> commit d864f30 Author: Wenzhuo Liu <[email protected]> Date: Mon Jun 20 16:38:21 2022 +0800 feat: add output_indices to join executors (risingwavelabs#3047) commit 13b9d58 Author: StrikeW <[email protected]> Date: Mon Jun 20 16:29:56 2022 +0800 feat(stream): enable append-only mv plan for kafka source (risingwavelabs#3333) commit ea386d3 Author: Liang <[email protected]> Date: Mon Jun 20 15:50:55 2022 +0800 refactor(compaction): deprecate the HashStrategy for OverlapStrategy (risingwavelabs#3331) commit a9fba38 Author: Liang <[email protected]> Date: Mon Jun 20 15:34:25 2022 +0800 fix(picker): fetch info from table_id field in sstableinfo (risingwavelabs#3332) commit 5d2bb42 Author: Bohan Zhang <[email protected]> Date: Mon Jun 20 14:55:59 2022 +0800 test(stream): add ci for split change mutation in source (risingwavelabs#3039) * stage Signed-off-by: tabVersion <[email protected]> * stage Signed-off-by: tabVersion <[email protected]> * add test Signed-off-by: tabVersion <[email protected]> * change e2e to datagen Signed-off-by: tabVersion <[email protected]> * stage Signed-off-by: tabVersion <[email protected]> * some bug to fix Signed-off-by: tabVersion <[email protected]> * fix async issue Signed-off-by: tabVersion <[email protected]> * add assert Signed-off-by: tabVersion <[email protected]> commit 04fe6d6 Author: Liang <[email protected]> Date: Mon Jun 20 14:47:56 2022 +0800 refactor(vnode bitmap): remove vnode bitmap in sst info (risingwavelabs#3329) commit c6d1288 Author: Li0k <[email protected]> Date: Mon Jun 20 14:29:56 2022 +0800 feat(storage): add manual compaction picker for targeted compaction (risingwavelabs#3288) * feat(storage): add ManualCompactionPicker * feat(storage): distinguish get_compaction_task for manual * feat(storage): meta client support more parameters for manual_compaction * chore(storage): add tracing and some notes * chore(storage): split manual_compaction_picker to independent file * feat(storage): fix target_input check pending and support manual_pick for dynamic_level_selector * fix(storage): internal_table_id include mv_id * fix(storage): fix picker check target_input_ssts pending * fix(storage): fix picker with total_file_size commit d04954f Author: Renjie Liu <[email protected]> Date: Mon Jun 20 14:27:01 2022 +0800 fix(ci): Reduce log (risingwavelabs#3330) commit eca9239 Author: Bugen Zhao <[email protected]> Date: Mon Jun 20 13:59:56 2022 +0800 refactor(storage): remove `Option` on pk serializer of cell-based table (risingwavelabs#3328) * minor refactor Signed-off-by: Bugen Zhao <[email protected]> * remove option of pk serializer Signed-off-by: Bugen Zhao <[email protected]> * remove pk serializer in state table Signed-off-by: Bugen Zhao <[email protected]> * extract vnode compute Signed-off-by: Bugen Zhao <[email protected]> * remove into order types Signed-off-by: Bugen Zhao <[email protected]> commit 9169436 Author: Bowen <[email protected]> Date: Mon Jun 20 13:44:10 2022 +0800 feat: apply relational refactor for hash agg (max, min) (risingwavelabs#2999) * feat: two closure can not get mut ref of same variable * use Arc::Mutex to wrap the state table * roll back string agg * add StateTable to get_output * finish basic coding (unit test failed) * finish basic coding * fix bug * show case * use empty Row for scan * tweak commit 35bb16a Author: Bugen Zhao <[email protected]> Date: Mon Jun 20 13:43:11 2022 +0800 refactor: use packed bitmap struct for vnode bitmap (risingwavelabs#3310) * use bitmap in streaming Signed-off-by: Bugen Zhao <[email protected]> * use bitmap in storage Signed-off-by: Bugen Zhao <[email protected]> * minor fix Signed-off-by: Bugen Zhao <[email protected]> * make bitmap optional Signed-off-by: Bugen Zhao <[email protected]> commit be50f93 Author: Liang <[email protected]> Date: Mon Jun 20 13:27:37 2022 +0800 feat(compaction): let compactor be unaware of vnode mapping (risingwavelabs#3321) commit 88258a6 Author: lmatz <[email protected]> Date: Sun Jun 19 21:31:01 2022 -0700 doc: no need to manually check in PR from forks (risingwavelabs#3325) commit c590e18 Author: zwang28 <[email protected]> Date: Mon Jun 20 12:13:50 2022 +0800 refactor(storage): split HummockVersion's levels by compaction group. (risingwavelabs#3206) commit f1c3298 Author: zwang28 <[email protected]> Date: Mon Jun 20 11:35:18 2022 +0800 feat(meta): register source to compaction group manager (risingwavelabs#3300) commit bf08b54 Author: Zack <[email protected]> Date: Mon Jun 20 11:25:39 2022 +0800 feat(frontend): Add sql string into context for debugging (risingwavelabs#3312) * feat(frontend): Add sql string into context for debugging * Remove renaming * Refactor to use str commit f784ba3 Author: zwang28 <[email protected]> Date: Mon Jun 20 11:22:12 2022 +0800 feat(storage): shared buffer flush L0 by compaction group (risingwavelabs#3200) commit bd48bba Author: Kexiang Wang <[email protected]> Date: Sun Jun 19 07:48:45 2022 -0400 feat: modify interfaces to support specifying parallelism for each fr… (risingwavelabs#3283) feat: modify interfaces to support specifying parallelism for each fragment commit 8f0e0b2 Author: Steven Chua <[email protected]> Date: Sun Jun 19 12:56:51 2022 +0800 feat(ctl): Support basic sst dump in risectl (risingwavelabs#3309) * feat(ctl): Add sst-dump command to risectl * feat(ctl): Fix risectl compatibility and remove VNode info commit daf9222 Author: Alex Chi <[email protected]> Date: Sat Jun 18 21:41:44 2022 +0800 feat(risedev): generate risectl config (risingwavelabs#3318) * feat(risedev): generate risectl config Signed-off-by: Alex Chi <[email protected]> * fix Signed-off-by: Alex Chi <[email protected]> commit 86ff992 Author: Alex Chi <[email protected]> Date: Sat Jun 18 20:52:51 2022 +0800 feat(ctl): support table scan (risingwavelabs#3317) * feat(ctl): support table scan Signed-off-by: Alex Chi <[email protected]> * license header Signed-off-by: Alex Chi <[email protected]> * add docs Signed-off-by: Alex Chi <[email protected]> commit 5ac5637 Author: Yikun Chen <[email protected]> Date: Sat Jun 18 08:03:06 2022 -0400 feat: support interval comparison (risingwavelabs#3222) 1. fix timestamp substract timestamp. 2. support interval comparison. From pgsql, 1 month equal to 30 days and 1 day equal to 86400000 ms. commit 722ff53 Author: Yuanxin Cao <[email protected]> Date: Fri Jun 17 16:47:22 2022 +0800 feat(meta): inform frontend of mview data distribution (risingwavelabs#3304) * feat(meta): inform frontend of mview data distribution * fix ut * set vnode mapping for materialzied source * move ParallelUnitId into common, move vnode related contants into common/types commit 8571ff4 Author: Renjie Liu <[email protected]> Date: Fri Jun 17 16:46:17 2022 +0800 feat(batch): All tests should run in both local and distributed mode (risingwavelabs#3306) * feat(batch): All tests should run in both local and distributed mode Signed-off-by: Little-Wallace <[email protected]>
What's changed and what's your intention?
Sorry for the huge PR, but this should be the min efforts to introduce the relational refactor. Mainly change is the AggState interface (mostly
get_output
), now we pass StateTable as & or &mut.After this PR, besides StringAgg (we currently do not have e2e test in it so i prefer not include here), all other agg call should use the relational table.
get_output(epoch)
->get_output(epoch, &StateTable<S>)
. The states may need to fetch data from remote store.row_count
,mark_as_dirty
.Arc<Mutex<>>
for all State Tables cuz I do not want to solve too difficult Multi-thread problems in this PR. Discussed and we thoughtapply_batch
(Mostly mem_table write) should not be parallelized so the lock is not a significant problem.There will be a lot of TODOs to make code more simple and clear:
See #3235.
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)