-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[commons] arrow -> arrow2 #1239
Conversation
Thanks for the contribution! Please review the labels and make any necessary changes. |
Baseline benchmarks compare arrow1 vs arrow2 shows great performance improvement. https://github.com/sundy-li/learn/tree/master/arrow-vs-arrow2/benches
After migrated datafuse into arrow2, the performance shows not much difference. Maybe the bottleneck is still in upstream, such as Arrow2 makes the code be much more clear and simple, so it's still better to have it. |
Codecov Report
@@ Coverage Diff @@
## master #1239 +/- ##
========================================
Coverage 70% 71%
========================================
Files 476 472 -4
Lines 27683 29018 +1335
========================================
+ Hits 19629 20625 +996
- Misses 8054 8393 +339
Continue to review full report at Codecov.
|
Performance with my AMD Ryzen 7 PRO 4750U(Still faster in most cases, expect for last two case):
|
Ok, but the baseline of sort performance is poor, I'm trying to find the bottleneck in arrow2. It's strange to see the group by query be slower but the aggregation is faster, this pr did not change the group by behavior. |
I have seen that, if it solved, arrow2 all is top performance against arrow1 |
For the groupby, may be the first test is fast in that time, i think we can meaused that again when the sort sloved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CI Passed |
I believe that the group by perf can be addressed with the |
The current design in datafuse is
If the key size is defined, we can allocate a large memory in one allocation, the size is |
I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/
Summary
Summary about this PR
Changelog
Related Issues
Fixes #1170
Test Plan
Unit Tests
Stateless Tests