use exponential search to speed up lexico partition #585

jimexist · 2021-07-21T14:08:16Z

Which issue does this PR close?

Closes #586

Rationale for this change

benchmark:

lexicographical_partition_ranges(u8) 2^10
                        time:   [13.430 us 13.624 us 13.846 us]
                        change: [-20.446% -19.617% -18.718%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  9 (9.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

lexicographical_partition_ranges(u8) 2^12
                        time:   [21.706 us 22.377 us 23.029 us]
                        change: [-10.809% -8.7265% -6.6501%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

lexicographical_partition_ranges(u8) 2^10 with nulls
                        time:   [12.534 us 12.701 us 12.869 us]
                        change: [-21.677% -20.203% -18.676%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

lexicographical_partition_ranges(u8) 2^12 with nulls
                        time:   [21.408 us 21.631 us 21.883 us]
                        change: [-9.3607% -7.8667% -6.3528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

lexicographical_partition_ranges(f64) 2^10
                        time:   [20.639 us 20.846 us 21.084 us]
                        change: [-64.686% -64.138% -63.561%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

lexicographical_partition_ranges(low cardinality) 1024
                        time:   [1.2718 us 1.2830 us 1.2953 us]
                        change: [+7.0358% +8.4813% +9.9578%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

What changes are included in this PR?

adopting an exponential search

Are there any user-facing changes?

arrow/src/compute/kernels/partition.rs

codecov-commenter · 2021-07-21T14:43:14Z

Codecov Report

Merging #585 (cafcac6) into master (d8da826) will increase coverage by 0.00%.
The diff coverage is 91.66%.

@@           Coverage Diff           @@
##           master     #585   +/-   ##
=======================================
  Coverage   82.46%   82.46%           
=======================================
  Files         167      167           
  Lines       46205    46213    +8     
=======================================
+ Hits        38101    38108    +7     
- Misses       8104     8105    +1

Impacted Files	Coverage Δ
arrow/src/compute/kernels/partition.rs	`97.65% <91.66%> (+0.15%)`	⬆️
parquet/src/encodings/encoding.rs	`94.85% <0.00%> (-0.20%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8da826...cafcac6. Read the comment docs.

alamb

Look like a good idea to me. I read the article and read the code carefully and I think it looks good.

Thanks @jimexist

alamb · 2021-07-25T10:52:02Z

arrow/src/compute/kernels/partition.rs

+    // note here we have right = min(indices.len(), bound + 1) because indices[bound] might
+    // actually be considered and must be included.
+    (bound / 2)
+        + indices[(bound / 2)..indices.len().min(bound + 1)]


I wonder if some of the performance improvement also comes from potentially making a smaller sequence -- aka (bound/2) .. len rather than partition_point .. len.

) Co-authored-by: Jiayu Liu <[email protected]>

github-actions bot added the arrow Changes to the arrow crate label Jul 21, 2021

jimexist marked this pull request as draft July 21, 2021 14:12

jimexist marked this pull request as ready for review July 21, 2021 14:15

jimexist commented Jul 21, 2021

View reviewed changes

arrow/src/compute/kernels/partition.rs Show resolved Hide resolved

jimexist changed the title ~~use exponential search in lexico partition to speed up~~ use exponential search to speed up lexico partition Jul 21, 2021

jimexist force-pushed the exponential-search branch from cafcac6 to 91187ed Compare July 24, 2021 02:59

use exponential search

d60557c

jimexist force-pushed the exponential-search branch from 91187ed to d60557c Compare July 24, 2021 03:00

alamb approved these changes Jul 25, 2021

View reviewed changes

alamb merged commit 6c1a86e into apache:master Jul 25, 2021

alamb pushed a commit that referenced this pull request Jul 25, 2021

use exponential search for lexicographical_partition_ranges (#585)

f55f6bd

alamb added the cherry-picked label Jul 25, 2021

alamb mentioned this pull request Jul 25, 2021

Cherry pick use exponential search to speed up lexico partition to active_release #608

Merged

jimexist deleted the exponential-search branch July 25, 2021 13:50

alamb added a commit that referenced this pull request Jul 26, 2021

use exponential search for lexicographical_partition_ranges (#585) (#608

820530d

) Co-authored-by: Jiayu Liu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use exponential search to speed up lexico partition #585

use exponential search to speed up lexico partition #585

jimexist commented Jul 21, 2021 •

edited

Loading

codecov-commenter commented Jul 21, 2021

alamb left a comment

alamb Jul 25, 2021

use exponential search to speed up lexico partition #585

use exponential search to speed up lexico partition #585

Conversation

jimexist commented Jul 21, 2021 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter commented Jul 21, 2021

Codecov Report

alamb left a comment

Choose a reason for hiding this comment

alamb Jul 25, 2021

Choose a reason for hiding this comment

jimexist commented Jul 21, 2021 •

edited

Loading