Optimize hashing using `ahash` and `multiversion` (-30%) #428

Dandandan · 2021-09-19T22:13:29Z

This uses the T::get_hash, which gives some speedup over the builder.

Also move ahash to the compute feature and add multiversioning to select/specialize on necessary instructions.

codecov · 2021-09-19T22:26:52Z

Codecov Report

Merging #428 (b8db919) into main (55ff79c) will decrease coverage by 0.01%.
The diff coverage is 91.30%.

@@            Coverage Diff             @@
##             main     #428      +/-   ##
==========================================
- Coverage   80.80%   80.78%   -0.02%     
==========================================
  Files         353      372      +19     
  Lines       22649    22643       -6     
==========================================
- Hits        18302    18293       -9     
- Misses       4347     4350       +3

Impacted Files	Coverage Δ
src/array/ord.rs	`64.21% <81.81%> (ø)`
src/compute/hash.rs	`93.33% <100.00%> (-0.67%)`	⬇️
src/compute/arithmetics/time.rs	`47.05% <0.00%> (-40.73%)`	⬇️
src/compute/arithmetics/mod.rs	`23.07% <0.00%> (-26.57%)`	⬇️
src/compute/contains.rs	`34.31% <0.00%> (-16.43%)`	⬇️
src/compute/take/mod.rs	`76.47% <0.00%> (-16.22%)`	⬇️
src/compute/aggregate/memory.rs	`25.00% <0.00%> (-15.00%)`	⬇️
src/compute/arithmetics/decimal/div.rs	`79.01% <0.00%> (-13.06%)`	⬇️
src/compute/arithmetics/decimal/mul.rs	`79.01% <0.00%> (-13.03%)`	⬇️
src/compute/aggregate/min_max.rs	`66.66% <0.00%> (-12.48%)`	⬇️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55ff79c...b8db919. Read the comment docs.

sundy-li · 2021-09-20T06:42:27Z

src/compute/hash.rs

-        },
-        DataType::UInt64,
-    )
+    let state = new_state!();


The state is initialized once for one array, will it cause a different hash result in another array?

No, it uses the same seeds each time.

Ok, but there is a hash builder inside get_hash, this may introduce extra allocate in each time, I did not find a way to improve that.

#[inline] fn get_hash<H: Hash + ?Sized, B: BuildHasher>(value: &H, build_hasher: &B) -> u64 { let mut hasher = build_hasher.build_hasher(); value.hash(&mut hasher); hasher.finish() }

jorgecarleitao · 2021-09-20T15:46:44Z

Awesome, thanks a lot! I updated the title to match the findings to show up in the changelog nicely.

Dandandan added 4 commits September 19, 2021 23:59

Optimize hashing

193ed9f

Optimize hashing

3bf1c0b

Move ahash to compute

905da20

Formatting

2616d1a

Multi-versioning (30% speed up)

b8db919

sundy-li reviewed Sep 20, 2021

View reviewed changes

Dandandan changed the title ~~Optimize hashing~~ Optimize hashing using ahash and multiversioning Sep 20, 2021

jorgecarleitao merged commit 38361d2 into jorgecarleitao:main Sep 20, 2021

jorgecarleitao changed the title ~~Optimize hashing using ahash and multiversioning~~ Optimize hashing using ahash and multiversion (-30%) Sep 20, 2021

jorgecarleitao added the enhancement An improvement to an existing feature label Sep 20, 2021

Dandandan mentioned this pull request Sep 20, 2021

Experimenting with arrow2 apache/datafusion#68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize hashing using `ahash` and `multiversion` (-30%) #428

Optimize hashing using `ahash` and `multiversion` (-30%) #428

Dandandan commented Sep 19, 2021 •

edited

Loading

codecov bot commented Sep 19, 2021 •

edited

Loading

sundy-li Sep 20, 2021 •

edited

Loading

Dandandan Sep 20, 2021

sundy-li Sep 23, 2021

jorgecarleitao commented Sep 20, 2021

Optimize hashing using ahash and multiversion (-30%) #428

Optimize hashing using ahash and multiversion (-30%) #428

Conversation

Dandandan commented Sep 19, 2021 • edited Loading

codecov bot commented Sep 19, 2021 • edited Loading

Codecov Report

sundy-li Sep 20, 2021 • edited Loading

Choose a reason for hiding this comment

Dandandan Sep 20, 2021

Choose a reason for hiding this comment

sundy-li Sep 23, 2021

Choose a reason for hiding this comment

jorgecarleitao commented Sep 20, 2021

Optimize hashing using `ahash` and `multiversion` (-30%) #428

Optimize hashing using `ahash` and `multiversion` (-30%) #428

Dandandan commented Sep 19, 2021 •

edited

Loading

codecov bot commented Sep 19, 2021 •

edited

Loading

sundy-li Sep 20, 2021 •

edited

Loading