-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cache): introduce tiered cache abstraction #4406
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4406 +/- ##
==========================================
- Coverage 74.54% 74.45% -0.09%
==========================================
Files 848 848
Lines 124226 124497 +271
==========================================
+ Hits 92605 92697 +92
- Misses 31621 31800 +179
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
FileCache(file_cache::cache::FileCacheOptions), | ||
} | ||
|
||
pub enum TieredCache<K, V> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will this enum be used? Will NoneCache only be used in non-linux target? If yes, I wonder whether we can prevent creating a TieredCache for non-linux target instead of introducing an enum, which makes each interface comes with a match
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoneCache
can be used on any target, but FileCache
only supports linux target.
Introducing TieredCache
enum is good for integrating tiered cache into hummock, or we must add #[cfg(target_os = "target")]
everywhere we use file cache in hummock code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about introducing a unify cache interface for both memory cache and file cache. In this case, if the target os linux and file cached is enabled, we create a cache instance containing both memory cache and file cache (similar to what we did in HybridObjectStore). Otherwise, we only create a block cache instance. I don't have a strong opinion on this so we can revisit it when we have file cache fully integrated with the memory cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure if unifying memory cache and tiered cache is a good idea, 🤔. Memory cache call are synchronous but tiered cache calls are asynchronous, unifying them may introduce extra cost for memory cache. Besides (weakly), #[async_trait]
is not zero-cost yet, we need to use GAT
to reduce the boxing cost (which is not that friendly for everyone, but affordable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure if unifying memory cache and tiered cache is a good idea, 🤔. Memory cache call are synchronous but tiered cache calls are asynchronous, unifying them may introduce extra cost for memory cache. Besides (weakly),
#[async_trait]
is not zero-cost yet, we need to useGAT
to reduce the boxing cost (which is not that friendly for everyone, but affordable).
Ah... I missed that file cache has async interface. Just ignore what i just said.
@wenym1 PTAL |
@@ -35,16 +36,81 @@ pub enum FsType { | |||
Xfs, | |||
} | |||
|
|||
pub struct WriteBatch<K, V> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, a more elegant and rust-style way to express the semantic of batch write will be, we have a StoreBatchWriter
pub struct StoreBatchWriter<'a, K, V>
where
K: TieredCacheKey,
V: TieredCacheValue,
{
keys: Vec<K>,
buffer: DioBuffer,
blocs: Vec<BlockLoc>,
block_size: usize,
_phantom: PhantomData<(K, V)>,
store: &'a mut Store
}
and there will be a start_batch_writer
method in Store
, which returns a StoreBatchWriter
.
When finish
of StoreBatchWriter
is called, the StoreBatchWriter
will do the encoding and write to the Store
. And in this way, the definition of Store
will not have an extra generic type V
, and we don't need to add the new batch
and insert
method in Store
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice advice! I'll refactor that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
commit 051cabf Author: Bohan Zhang <[email protected]> Date: Fri Aug 5 15:47:13 2022 +0800 refactor(sink): rename sink properties (risingwavelabs#4465) * rename sink properties Signed-off-by: tabVersion <[email protected]> * fix Signed-off-by: tabVersion <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit a12a9a8 Author: Noel Kwan <[email protected]> Date: Fri Aug 5 15:08:10 2022 +0800 feat(sqlsmith): generate extreme values (risingwavelabs#4345) * gen extreme for integrals * test parsing output * workaround * use extreme for float, temporal * Revert "test parsing output" This reverts commit 3d12856cb847727f4009aa6ce8f6033cc4d24393. * use i64 for min/max float * interim * mention workarounds * clean Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit ecdadce Author: jon-chuang <[email protected]> Date: Fri Aug 5 14:56:16 2022 +0800 fix(planner): Tumble can accept CTE as input (risingwavelabs#4450) fix tumble window Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit a232f2e Author: congyi <[email protected]> Date: Fri Aug 5 14:40:44 2022 +0800 refactor(row-serde): make row-serde directory clearer and update comments (risingwavelabs#4443) * refactor row-serde and update comments * rename Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 9bd5db3 Author: Kexiang Wang <[email protected]> Date: Fri Aug 5 02:05:29 2022 -0400 fix(meta): the ddl lock is not really locked (risingwavelabs#4461) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 56a1897 Author: Richard Chien <[email protected]> Date: Fri Aug 5 13:03:26 2022 +0800 feat(optimizer): support stateless 2-phase agg optimization for append-only min/max (risingwavelabs#4433) * use stateless 2-phase agg optimization for append-only min/max aggs * update plan snapshots Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 1ae6836 Author: sinemora <[email protected]> Date: Fri Aug 5 12:50:34 2022 +0800 feat(frontend): add CREATEUSER and NOCREATEUSER option for user create/alter (risingwavelabs#4447) * initial * fix judge * fix * fix * fix pg_user Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit fc6b36d Author: Liang <[email protected]> Date: Fri Aug 5 12:37:40 2022 +0800 feat(meta): meta push initial hummock version into CN (risingwavelabs#4459) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 80beff0 Author: jon-chuang <[email protected]> Date: Fri Aug 5 12:25:08 2022 +0800 fix(planner): DynamicFilter's LHS should follow upstream distribution (risingwavelabs#4452) fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit ce17661 Author: Kexiang Wang <[email protected]> Date: Fri Aug 5 00:11:47 2022 -0400 feat(meta): add source info and stream source split info in get_clust… (risingwavelabs#4277) feat(meta): add source info and stream source split info in get_cluster_info Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit f97a836 Author: jon-chuang <[email protected]> Date: Fri Aug 5 11:58:34 2022 +0800 fix(planner): Pullup predicates in `LogicalScan` into `BatchLookupJoin` (risingwavelabs#4453) fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 9b259bc Author: zwang28 <[email protected]> Date: Fri Aug 5 11:13:33 2022 +0800 feat(meta): piggyback extra info in heartbeat RPC. (risingwavelabs#4435) * feat(meta): piggyback extra info in heartbeat RPC. * collect extra_info_sources before start heartbeat task commit bf1eb44 Author: Croxx <[email protected]> Date: Thu Aug 4 18:53:42 2022 +0800 fix: build on non-linux target (risingwavelabs#4448) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit fe9cc4d Author: Wallace <[email protected]> Date: Thu Aug 4 18:40:30 2022 +0800 fix(cache): clean pending request immediately when future cancel (risingwavelabs#4422) * fix future drop Signed-off-by: Little-Wallace <[email protected]> * clean pending request Signed-off-by: Little-Wallace <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit b016a50 Author: zwang28 <[email protected]> Date: Thu Aug 4 18:27:22 2022 +0800 refactor(common): add rpc client pool (risingwavelabs#4410) refactor(common): add rpc client pool trait Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 934350c Author: Huangjw <[email protected]> Date: Thu Aug 4 18:14:46 2022 +0800 fix(ci): fix main ci build timeout (risingwavelabs#4444) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 96c8ea6 Author: ZENOTME <[email protected]> Date: Thu Aug 4 18:02:13 2022 +0800 chore: support more query statement in extended query mode (risingwavelabs#4441) * * support more query statement('select','describe','values','show','with') * support returning error when can't find a statement or portal rather than panic * modify some filed name of message * * add comment * unify some name Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 2f98f9c Author: Croxx <[email protected]> Date: Thu Aug 4 17:49:32 2022 +0800 feat(cache): introduce tiered cache abstraction (risingwavelabs#4406) * feat(cache): introduce tiered cache abstraction * refactor write batch design Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 314473c Author: Runji Wang <[email protected]> Date: Thu Aug 4 17:32:21 2022 +0800 feat(test): make simulation test deterministic! (risingwavelabs#4336) * 3 compute nodes Signed-off-by: Runji Wang <[email protected]> * revert `madsim::time::Instant` to std's Signed-off-by: Runji Wang <[email protected]> * make `RowIdGenerator` async Signed-off-by: Runji Wang <[email protected]> * switch to nextest Signed-off-by: Runji Wang <[email protected]> * fix simulated e2e test Signed-off-by: Runji Wang <[email protected]> * update madsim to v0.2.0-alpha.6 and tonic to v0.8 now additional system protoc is required. Signed-off-by: Runji Wang <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 2d8ef9e Author: August <[email protected]> Date: Thu Aug 4 17:10:26 2022 +0800 fix: fix observer version check and add sync in compute/compactor node (risingwavelabs#4439) * fix: fix observer version check for init notification * add sync * fix sqlsmith Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit aff3352 Author: Li0k <[email protected]> Date: Thu Aug 4 16:57:32 2022 +0800 chore(risedev): unify clippy between risedev c and pre-unit-test.sh (risingwavelabs#4436) * chore(risedev): unify clippy between risedev c and pre-unit-test.sh * fix(frontend): fix clippy Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit bffc66b Author: Huangjw <[email protected]> Date: Thu Aug 4 16:45:08 2022 +0800 chore(ci): remove repeated ci steps (risingwavelabs#4423) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 7a0a99d Author: xxchan <[email protected]> Date: Thu Aug 4 11:32:21 2022 +0300 chore: update comments for dispatcher (risingwavelabs#4413) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 667ab5d Author: Bohan Zhang <[email protected]> Date: Thu Aug 4 16:19:20 2022 +0800 feat: support explain sink (risingwavelabs#4430) * support explain sink Signed-off-by: tabVersion <[email protected]> * format Signed-off-by: tabVersion <[email protected]> * add ci for explain sink Signed-off-by: tabVersion <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 5fe991c Author: xxchan <[email protected]> Date: Thu Aug 4 11:06:50 2022 +0300 fix: use SomeShard as the distribution of batch scan (risingwavelabs#4420) * fix: use SomeShard as the distribution of batch scan * drop Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 2d42495 Author: Li0k <[email protected]> Date: Thu Aug 4 15:29:15 2022 +0800 fix(frontend): fix table option for internal table (risingwavelabs#4416) * feat(storage): save properties in ctx and remove support for source * feat(frontend): support properties for agg_plan and hash_join_plan * fix(frontend): assign properties for stream_materialize_view::create and add some limit * chore(front): remove redundant code * fix(frontend): fix clippy Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit ae0388a Author: August <[email protected]> Date: Thu Aug 4 15:15:52 2022 +0800 fix(frontend): make WITH optional for user create and alter (risingwavelabs#4414) fix: make WITH optional for user create and alter Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit f103daf Author: xiangjinwu <[email protected]> Date: Thu Aug 4 14:59:50 2022 +0800 fix(binder): allow `Clone` for `expr::Subquery` as part of CTE (risingwavelabs#4424) fix(binder): allow Clone for expr::Subquery as part of CTE commit ea36e52 Author: Lee Zong Yu <[email protected]> Date: Thu Aug 4 13:02:45 2022 +0800 feat(sqlsmith): Enable gen_agg but workaround distinct agg (risingwavelabs#4421) Enable gen_agg commit cc4224d Author: Noel Kwan <[email protected]> Date: Thu Aug 4 11:44:17 2022 +0800 feat(sqlsmith): generate explicit type castings (risingwavelabs#4419) * gen cast map * gen cast * dump cast table * gen explicit cast expr * avoid varchar casts * fix review comments Signed-off-by: Little-Wallace <[email protected]>
* feat(cache): introduce tiered cache abstraction * refactor write batch design Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
Changes:
TieredCache
and related abstractions, to unify underlying tiered cache implementation on various targets (non-linux doesn't support file cache).TieredCacheValue
trait to support encode/decode only on demand.Filter
in the file cache system.Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#198
#4050