-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(mito): Allow creating multiple files in ParquetWriter #5291
base: main
Are you sure you want to change the base?
refactor(mito): Allow creating multiple files in ParquetWriter #5291
Conversation
- Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths. - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management. - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths. - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`. - **Enhanced Indexer Management**: - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation. - Updated `ParquetWriter` to handle multiple indexers and file IDs. - Files affected: `index.rs`, `parquet.rs`, `writer.rs`. - **Removed Redundant File ID Handling**: - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`. - Updated related logic to dynamically generate file IDs where necessary. - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`. - **Test Adjustments**: - Updated tests to align with new path and indexer management. - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes. - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5291 +/- ##
==========================================
- Coverage 84.07% 83.84% -0.23%
==========================================
Files 1199 1199
Lines 224193 224648 +455
==========================================
- Hits 188493 188366 -127
- Misses 35700 36282 +582 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -271,15 +271,14 @@ impl Compactor for DefaultCompactor { | |||
compacted_inputs.extend(output.inputs.iter().map(|f| f.meta_ref().clone())); | |||
|
|||
info!( | |||
"Compaction region {} output [{}]-> {}", | |||
"Region {} compaction input: [{}]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find a way to log the relationship of compaction input and output.
#[async_trait::async_trait] | ||
pub trait IndexerBuilder { | ||
/// Builds indexer of given file id to [index_file_path]. | ||
async fn build(&self, file_id: FileId, index_file_path: String) -> Indexer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a trait for the builder? We only have one builder implementation.
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR mainly refactors
ParquetWriter
so it can creates multiple output files.Move flush/compaction output file id generation from outside to inside
ParquetWriter
:FilePathProvider
trait and its implementations (WriteCachePathProvider
,RegionFilePathFactory
) to generate SST and index file paths because we no longer know file id before hand.AccessLayer
,WriteCache
, andParquetWriter
to useFilePathProvider
for path management.SstWriteRequest
andSstUploadRequest
to use path providers instead of direct paths.access_layer.rs
,write_cache.rs
,parquet.rs
,writer.rs
.Enhanced Indexer Management:
IndexerBuilder
withIndexerBuilderImpl
and made it async to support dynamic indexer creation so we can defer the creation of indexer to when FileId is determined.ParquetWriter
to handle multiple indexers and file IDs.index.rs
,parquet.rs
,writer.rs
.Removed Redundant File ID Handling:
file_id
fromSstWriteRequest
andCompactionOutput
.compaction.rs
,flush.rs
,picker.rs
,twcs.rs
,window.rs
.Test Adjustments:
FixedPathProvider
andNoopIndexBuilder
for testing purposes.sst_util.rs
,version_util.rs
,parquet.rs
.PR Checklist
Please convert it to a draft if some of the following conditions are not met.