Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: tweak deletion batch size & log messages #15005

Merged

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Mar 19, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • tweak the batch size used in remove_file_in_batch
    Instead of setting a fixed batch size of 1000 for file deletion, adjust the batch size based on the max_threads setting to allow for more parallel deletion of files, i.e. take "(number of files) / max_threads" as the size of batch.

In latency-sensitive scenarios, adjusting the batch size of object storage file deletion, may improve performance significantly:

before this PR (with customized log)

ec2 query node + s3 in the same region

src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 80, time used 774.073632ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 60, time used 648.507035ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 20, time used 264.455804ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 40, time used 510.963558ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 20, time used 273.607883ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 20, time used 266.098798ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 20, time used 294.253539ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 33, time used 426.443657ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 27, time used 406.260959ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 20, time used 274.669911ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 20, time used 341.166744ms
src/query/storages/fuse/src/io/files.rs:85 deleted files, # of files 37, time used 488.287495ms

with this PR:

src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 80 files in 200.192256ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 29 files in 88.241821ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 31 files in 111.350945ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 20 files in 74.052776ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 18 files in 67.15381ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 12 files in 38.340867ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 10 files in 55.873557ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 20 files in 58.602681ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 19 files in 58.248392ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 3 files in 55.438356ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 18 files in 71.469202ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 20 files in 65.742636ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 16 files in 49.608766ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 4 files in 35.568428ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 20 files in 72.753463ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 20 files in 74.300976ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 18 files in 78.302705ms
src/query/service/src/pipelines/builders/builder_on_finished.rs:126 purge 13 files in 115.849776ms
  • logs(info level) the time used in logical plan and binding phase

  • Fixes #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Mar 19, 2024
@dantengsky dantengsky force-pushed the chore-tweak-deletion-batch-size branch from 9867d67 to c8d3c03 Compare March 19, 2024 02:28
@dantengsky dantengsky changed the title chore: tweak deletion batch size chore: tweak deletion batch size & log messages Mar 19, 2024
@dantengsky dantengsky marked this pull request as ready for review March 19, 2024 07:01
@BohuTANG BohuTANG merged commit 657cfdd into databendlabs:main Mar 19, 2024
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants