Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: make block meta easier to be cloned #8548

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Oct 31, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

After reading SegmentInfo from storage, block meta is owned by an Arc<SegmentInfo> (which may be cached).
For situations that need block metas only, like pruning, we have no choice but to clone the block meta, which hurts the performance.

In this PR, changes SegmentInfo from

pub struct SegmentInfo {
   ....
    pub blocks: Vec<BlockMeta>,
   ....
}

to

pub struct SegmentInfo {
   ....
    pub blocks: Vec<Arc<BlockMeta>>,
   ....
}

and refactor related components accordingly.

Fixes #issue

@vercel
Copy link

vercel bot commented Oct 31, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Oct 31, 2022 at 7:58AM (UTC)

@mergify mergify bot added the pr-refactor this PR changes the code base without new features or bugfix label Oct 31, 2022
@Xuanwo
Copy link
Member

Xuanwo commented Oct 31, 2022

BlockMeta itself is cheap to be cloned. How about move col_stats & col_metas to Arc<HashMap> instead?

@dantengsky
Copy link
Member Author

dantengsky commented Oct 31, 2022

BlockMeta itself is cheap to be cloned.

according to @sundy-li 's performance profiling, cloning of BlockMeta do hurt performance and highly likely hurts badly for the large tables.

How about move col_stats & col_metas to Arc<HashMap> instead?

sounds reasonable to me. let's do the refactor if performance profiling identifies them to be bottlenecks.

@Xuanwo
Copy link
Member

Xuanwo commented Oct 31, 2022

cloning of BlockMeta do hurt performance and highly likely hurts badly for the large tables.

Yep. Most cost happened at the clone for hashmap.

image

@sundy-li
Copy link
Member

It works

@dantengsky dantengsky marked this pull request as ready for review October 31, 2022 12:25
@dantengsky dantengsky requested review from Xuanwo and BohuTANG October 31, 2022 12:26
@BohuTANG
Copy link
Member

Because we clone BlockMeta so the hashmap is cloned, and with this PR it will avoid.

@BohuTANG BohuTANG merged commit a2c30b8 into databendlabs:main Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants