Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(hashtable): adaptive string hash table #7971

Merged
merged 18 commits into from
Oct 27, 2022
Merged

Conversation

usamoi
Copy link
Contributor

@usamoi usamoi commented Sep 29, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

It's the pull request about OSPP 2022 Adaptive String Hash Table.

@vercel
Copy link

vercel bot commented Sep 29, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Oct 27, 2022 at 1:38PM (UTC)

@usamoi usamoi marked this pull request as draft September 29, 2022 14:33
@mergify mergify bot added the pr-refactor this PR changes the code base without new features or bugfix label Sep 29, 2022
@FANNG1
Copy link

FANNG1 commented Sep 30, 2022

cool, is there any performance stats if using adaptive string hash table ?

@PsiACE
Copy link
Member

PsiACE commented Sep 30, 2022

cool, is there any performance stats if using adaptive string hash table ?

https://github.com/usamoi/saha/blob/main/doc/benchmark.md

No performance evaluation of databend with saha is available at this time, but for the performance of the saha implementation, see the link above.

@Xuanwo
Copy link
Member

Xuanwo commented Oct 26, 2022

Hi, any updates on this PR? The conflict files list is longer day by day...

Is it possible to split it into multiple so that we don't need to pending on a large PR?

@PsiACE

This comment was marked as outdated.

@sundy-li

This comment was marked as outdated.

@sundy-li
Copy link
Member

sundy-li commented Oct 27, 2022

I just bench it with the hits dataset. It shows 20%~50% performance improvement in group aggregation queries.

Query Main new hashtable ratio(old/new) comments
Q1 1.05 0.94 1.117021277  
Q2 2.9 1.98 1.464646465 distinct int
Q3 1.5 1.3 1.153846154 distinct string
Q4 0.05 0.05 1  
Q5 2.62 1.78 1.471910112 group by int
Q6 2.79 1.97 1.416243655 group by int
Q7 0.38 0.3 1.266666667 group by string
Q8 0.43 0.34 1.264705882 group by string
Q9 3.1 2.83 1.09540636  
Q10 3.42 3.42 1  
Q11 20.72 20.83 0.9947191551  
Q12 4.56 4.74 0.9620253165  
Q13 6.62 6.02 1.099667774  
Q14 16.98 15.83 1.072646873  

https://docs.google.com/spreadsheets/d/1BnNEoaA37XxhCQwzwMxp2gRK04tp8hqnwbY6m1pVT1o/edit#gid=0

@usamoi usamoi marked this pull request as ready for review October 27, 2022 11:17
@PsiACE
Copy link
Member

PsiACE commented Oct 27, 2022

@usamoi Great job, glad to see you did it.

cc @sundy-li , Once the version is released, we can consider updating the version in ClickBench.

@Xuanwo
Copy link
Member

Xuanwo commented Oct 27, 2022

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Oct 27, 2022

update

✅ Branch has been successfully updated

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job!

src/common/hashtable/src/table0.rs Show resolved Hide resolved
@mergify mergify bot merged commit ac00014 into databendlabs:main Oct 27, 2022
@BohuTANG
Copy link
Member

Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants