Speed up advancing within a block, take 2. #13958

jpountz · 2024-10-25T16:27:22Z

PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks.

This PR tries a different approach using vectorization. Experimentation suggests that it slows down a bit queries when advancing often goes to the very next doc ID, such as term queries and OrHighNotXXX tasks. But it speeds up queries that advance to the next few doc IDs, such as AndHighHigh. I think that this is a good trade-off since it slows down some plenty fast queries in exchange for a speedup with some more expensive queries.

Here is a luceneutil run on wikibigall with -searchConcurrency 0:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   OrHighNotHigh      302.78      (2.4%)      283.75      (2.9%)   -6.3% ( -11% -   -1%) 0.000
                    OrHighNotMed      384.69      (3.0%)      363.33      (2.8%)   -5.6% ( -10% -    0%) 0.000
                         MedTerm      564.86      (2.2%)      537.04      (3.5%)   -4.9% ( -10% -    0%) 0.000
                         LowTerm     1014.02      (2.2%)      967.37      (3.6%)   -4.6% ( -10% -    1%) 0.000
                    OrHighNotLow      446.38      (3.4%)      427.10      (3.3%)   -4.3% ( -10% -    2%) 0.000
                        HighTerm      485.41      (1.9%)      464.49      (3.2%)   -4.3% (  -9% -    0%) 0.000
                   OrNotHighHigh      229.78      (2.4%)      221.51      (3.1%)   -3.6% (  -8% -    1%) 0.000
                    OrNotHighMed      396.63      (2.7%)      382.41      (3.1%)   -3.6% (  -9% -    2%) 0.000
                         Prefix3      145.65      (3.6%)      142.39      (3.7%)   -2.2% (  -9% -    5%) 0.051
                          IntNRQ      158.04      (4.7%)      154.77      (5.6%)   -2.1% ( -11% -    8%) 0.205
                       CountTerm     8320.96      (3.2%)     8198.56      (4.7%)   -1.5% (  -9% -    6%) 0.246
                        PKLookup      273.35      (3.6%)      269.71      (5.2%)   -1.3% (  -9% -    7%) 0.345
                        Wildcard       83.30      (3.4%)       82.28      (3.1%)   -1.2% (  -7% -    5%) 0.234
               HighTermMonthSort     3235.98      (3.1%)     3198.04      (2.9%)   -1.2% (  -6% -    4%) 0.215
               HighTermTitleSort      148.94      (2.5%)      148.38      (2.6%)   -0.4% (  -5% -    4%) 0.638
                  CountOrHighMed      104.51      (2.0%)      104.22      (1.7%)   -0.3% (  -3% -    3%) 0.640
            HighTermTitleBDVSort       14.67      (5.3%)       14.64      (5.9%)   -0.2% ( -10% -   11%) 0.899
                    AndStopWords       30.68      (3.0%)       30.66      (2.7%)   -0.1% (  -5% -    5%) 0.941
                 CountOrHighHigh       50.17      (2.0%)       50.19      (1.9%)    0.0% (  -3% -    3%) 0.947
                      OrHighRare      273.82      (4.5%)      273.96      (3.8%)    0.0% (  -7% -    8%) 0.971
                      TermDTSort      353.37      (6.4%)      354.23      (6.7%)    0.2% ( -12% -   14%) 0.907
                          Fuzzy1       77.85      (2.6%)       78.12      (2.0%)    0.3% (  -4% -    4%) 0.633
                          Fuzzy2       73.23      (2.5%)       73.50      (1.9%)    0.4% (  -3% -    4%) 0.594
           HighTermDayOfYearSort      836.62      (3.1%)      841.07      (4.0%)    0.5% (  -6% -    7%) 0.639
             And2Terms2StopWords      154.49      (1.8%)      155.41      (2.1%)    0.6% (  -3% -    4%) 0.340
                       OrHighLow      771.90      (2.0%)      778.20      (2.2%)    0.8% (  -3% -    5%) 0.217
                       And3Terms      167.63      (2.3%)      169.23      (2.2%)    1.0% (  -3% -    5%) 0.176
                     OrStopWords       33.99      (4.6%)       34.39      (4.1%)    1.2% (  -7% -   10%) 0.388
                 CountAndHighMed      148.01      (2.4%)      149.91      (1.0%)    1.3% (  -2% -    4%) 0.025
              Or2Terms2StopWords      156.93      (2.8%)      159.21      (3.0%)    1.5% (  -4% -    7%) 0.117
                     AndHighHigh       67.06      (1.3%)       68.07      (1.6%)    1.5% (  -1% -    4%) 0.001
                          OrMany       18.67      (2.9%)       18.96      (2.9%)    1.5% (  -4% -    7%) 0.089
                      AndHighMed      185.02      (1.6%)      189.06      (1.3%)    2.2% (   0% -    5%) 0.000
                      AndHighLow      948.34      (2.6%)      970.47      (2.6%)    2.3% (  -2% -    7%) 0.004
                      OrHighHigh       68.42      (1.4%)       70.08      (1.3%)    2.4% (   0% -    5%) 0.000
                        Or3Terms      166.47      (2.7%)      171.10      (3.1%)    2.8% (  -2% -    8%) 0.003
                    OrNotHighLow      964.69      (3.1%)      994.46      (3.3%)    3.1% (  -3% -    9%) 0.002
                       OrHighMed      222.32      (2.1%)      230.93      (1.5%)    3.9% (   0% -    7%) 0.000
                CountAndHighHigh       48.88      (2.4%)       52.87      (1.3%)    8.2% (   4% -   12%) 0.000

PR apache#13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks. This PR tries a different approach using vectorization. Experimentation suggests that it slows down a bit queries when advancing often goes to the very next doc ID, such as term queries and `OrHighNotXXX` tasks. But it speeds up queries that advance to the next few doc IDs, such as `AndHighHigh`. I think that this is a good trade-off since it slows down some plenty fast queries in exchange for a speedup with some more expensive queries. Here is a `luceneutil` run on `wikibigall` with `-searchConcurrency 0`: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighNotHigh 302.78 (2.4%) 283.75 (2.9%) -6.3% ( -11% - -1%) 0.000 OrHighNotMed 384.69 (3.0%) 363.33 (2.8%) -5.6% ( -10% - 0%) 0.000 MedTerm 564.86 (2.2%) 537.04 (3.5%) -4.9% ( -10% - 0%) 0.000 LowTerm 1014.02 (2.2%) 967.37 (3.6%) -4.6% ( -10% - 1%) 0.000 OrHighNotLow 446.38 (3.4%) 427.10 (3.3%) -4.3% ( -10% - 2%) 0.000 HighTerm 485.41 (1.9%) 464.49 (3.2%) -4.3% ( -9% - 0%) 0.000 OrNotHighHigh 229.78 (2.4%) 221.51 (3.1%) -3.6% ( -8% - 1%) 0.000 OrNotHighMed 396.63 (2.7%) 382.41 (3.1%) -3.6% ( -9% - 2%) 0.000 Prefix3 145.65 (3.6%) 142.39 (3.7%) -2.2% ( -9% - 5%) 0.051 IntNRQ 158.04 (4.7%) 154.77 (5.6%) -2.1% ( -11% - 8%) 0.205 CountTerm 8320.96 (3.2%) 8198.56 (4.7%) -1.5% ( -9% - 6%) 0.246 PKLookup 273.35 (3.6%) 269.71 (5.2%) -1.3% ( -9% - 7%) 0.345 Wildcard 83.30 (3.4%) 82.28 (3.1%) -1.2% ( -7% - 5%) 0.234 HighTermMonthSort 3235.98 (3.1%) 3198.04 (2.9%) -1.2% ( -6% - 4%) 0.215 HighTermTitleSort 148.94 (2.5%) 148.38 (2.6%) -0.4% ( -5% - 4%) 0.638 CountOrHighMed 104.51 (2.0%) 104.22 (1.7%) -0.3% ( -3% - 3%) 0.640 HighTermTitleBDVSort 14.67 (5.3%) 14.64 (5.9%) -0.2% ( -10% - 11%) 0.899 AndStopWords 30.68 (3.0%) 30.66 (2.7%) -0.1% ( -5% - 5%) 0.941 CountOrHighHigh 50.17 (2.0%) 50.19 (1.9%) 0.0% ( -3% - 3%) 0.947 OrHighRare 273.82 (4.5%) 273.96 (3.8%) 0.0% ( -7% - 8%) 0.971 TermDTSort 353.37 (6.4%) 354.23 (6.7%) 0.2% ( -12% - 14%) 0.907 Fuzzy1 77.85 (2.6%) 78.12 (2.0%) 0.3% ( -4% - 4%) 0.633 Fuzzy2 73.23 (2.5%) 73.50 (1.9%) 0.4% ( -3% - 4%) 0.594 HighTermDayOfYearSort 836.62 (3.1%) 841.07 (4.0%) 0.5% ( -6% - 7%) 0.639 And2Terms2StopWords 154.49 (1.8%) 155.41 (2.1%) 0.6% ( -3% - 4%) 0.340 OrHighLow 771.90 (2.0%) 778.20 (2.2%) 0.8% ( -3% - 5%) 0.217 And3Terms 167.63 (2.3%) 169.23 (2.2%) 1.0% ( -3% - 5%) 0.176 OrStopWords 33.99 (4.6%) 34.39 (4.1%) 1.2% ( -7% - 10%) 0.388 CountAndHighMed 148.01 (2.4%) 149.91 (1.0%) 1.3% ( -2% - 4%) 0.025 Or2Terms2StopWords 156.93 (2.8%) 159.21 (3.0%) 1.5% ( -4% - 7%) 0.117 AndHighHigh 67.06 (1.3%) 68.07 (1.6%) 1.5% ( -1% - 4%) 0.001 OrMany 18.67 (2.9%) 18.96 (2.9%) 1.5% ( -4% - 7%) 0.089 AndHighMed 185.02 (1.6%) 189.06 (1.3%) 2.2% ( 0% - 5%) 0.000 AndHighLow 948.34 (2.6%) 970.47 (2.6%) 2.3% ( -2% - 7%) 0.004 OrHighHigh 68.42 (1.4%) 70.08 (1.3%) 2.4% ( 0% - 5%) 0.000 Or3Terms 166.47 (2.7%) 171.10 (3.1%) 2.8% ( -2% - 8%) 0.003 OrNotHighLow 964.69 (3.1%) 994.46 (3.3%) 3.1% ( -3% - 9%) 0.002 OrHighMed 222.32 (2.1%) 230.93 (1.5%) 3.9% ( 0% - 7%) 0.000 CountAndHighHigh 48.88 (2.4%) 52.87 (1.3%) 8.2% ( 4% - 12%) 0.000 ```

jpountz · 2024-10-25T20:29:11Z

Specializing ImpactsDISI#nextDoc() helped get rid of the slowdown:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    AndStopWords       31.34      (1.8%)       30.84      (4.0%)   -1.6% (  -7% -    4%) 0.105
                       CountTerm     8573.12      (3.8%)     8449.05      (4.7%)   -1.4% (  -9% -    7%) 0.284
                  CountOrHighMed      105.75      (2.1%)      104.50      (1.4%)   -1.2% (  -4% -    2%) 0.039
                      TermDTSort      363.06      (6.4%)      358.98      (6.6%)   -1.1% ( -13% -   12%) 0.585
                 CountOrHighHigh       50.62      (2.4%)       50.28      (1.7%)   -0.7% (  -4% -    3%) 0.305
                          IntNRQ      453.67      (4.7%)      451.13      (4.5%)   -0.6% (  -9% -    9%) 0.700
                      OrHighRare      283.32      (3.8%)      282.52      (3.8%)   -0.3% (  -7% -    7%) 0.813
                          Fuzzy1       78.58      (2.1%)       78.42      (3.0%)   -0.2% (  -5% -    5%) 0.812
           HighTermDayOfYearSort      850.86      (4.4%)      849.52      (3.0%)   -0.2% (  -7% -    7%) 0.895
            HighTermTitleBDVSort       13.97      (6.3%)       13.96      (5.5%)   -0.1% ( -11% -   12%) 0.974
             And2Terms2StopWords      157.31      (1.3%)      157.27      (2.2%)   -0.0% (  -3% -    3%) 0.965
                         LowTerm      985.67      (3.0%)      986.01      (1.8%)    0.0% (  -4% -    4%) 0.964
               HighTermMonthSort     3216.69      (2.2%)     3217.92      (3.9%)    0.0% (  -5% -    6%) 0.969
                          Fuzzy2       73.69      (2.0%)       73.74      (2.4%)    0.1% (  -4% -    4%) 0.910
                     AndHighHigh       65.88      (2.1%)       66.18      (2.0%)    0.5% (  -3% -    4%) 0.472
                       And3Terms      169.85      (2.0%)      170.81      (2.4%)    0.6% (  -3% -    5%) 0.424
                          OrMany       19.10      (1.7%)       19.22      (1.7%)    0.6% (  -2% -    4%) 0.237
              Or2Terms2StopWords      160.88      (1.4%)      161.91      (2.0%)    0.6% (  -2% -    4%) 0.241
                     OrStopWords       34.90      (1.4%)       35.15      (3.9%)    0.7% (  -4% -    6%) 0.450
                       OrHighLow      799.18      (1.6%)      805.33      (1.5%)    0.8% (  -2% -    3%) 0.117
                 CountAndHighMed      149.99      (3.1%)      151.23      (1.1%)    0.8% (  -3% -    5%) 0.261
                        Wildcard       88.47      (2.7%)       89.32      (3.2%)    1.0% (  -4% -    7%) 0.309
                        PKLookup      270.87      (3.8%)      273.47      (1.7%)    1.0% (  -4% -    6%) 0.307
                         Prefix3       93.00      (8.2%)       94.14      (6.3%)    1.2% ( -12% -   17%) 0.599
                         MedTerm      690.05      (2.6%)      701.55      (1.3%)    1.7% (  -2% -    5%) 0.010
                    OrHighNotMed      359.57      (2.7%)      366.02      (1.9%)    1.8% (  -2% -    6%) 0.014
                        Or3Terms      170.81      (1.3%)      173.98      (2.1%)    1.9% (  -1% -    5%) 0.001
                    OrHighNotLow      432.25      (3.4%)      440.76      (2.4%)    2.0% (  -3% -    8%) 0.035
               HighTermTitleSort      159.15      (4.8%)      162.44      (2.9%)    2.1% (  -5% -   10%) 0.096
                      AndHighMed      225.25      (2.6%)      229.93      (1.4%)    2.1% (  -1% -    6%) 0.002
                        HighTerm      455.45      (2.4%)      465.69      (2.1%)    2.2% (  -2% -    6%) 0.002
                      OrHighHigh       78.87      (1.5%)       80.64      (1.5%)    2.3% (   0% -    5%) 0.000
                   OrHighNotHigh      218.32      (2.7%)      224.10      (2.0%)    2.6% (  -2% -    7%) 0.000
                    OrNotHighLow     1111.11      (2.8%)     1144.28      (2.5%)    3.0% (  -2% -    8%) 0.000
                       OrHighMed      267.13      (1.8%)      275.57      (1.3%)    3.2% (   0% -    6%) 0.000
                    OrNotHighMed      303.24      (3.0%)      313.56      (2.5%)    3.4% (  -2% -    9%) 0.000
                   OrNotHighHigh      230.18      (2.8%)      238.62      (2.2%)    3.7% (  -1% -    8%) 0.000
                      AndHighLow      866.39      (2.7%)      903.54      (2.4%)    4.3% (   0% -    9%) 0.000
                CountAndHighHigh       49.60      (3.1%)       53.54      (0.9%)    7.9% (   3% -   12%) 0.000

jpountz · 2024-10-25T21:33:32Z

And I seem to be getting a better speedup by using trueCount() instead of firstTrue():

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                       CountTerm     8621.82      (5.6%)     8504.44      (4.6%)   -1.4% ( -10% -    9%) 0.401
                    AndStopWords       31.14      (1.4%)       30.83      (4.6%)   -1.0% (  -6% -    5%) 0.363
                         Prefix3       96.42      (5.7%)       95.50      (4.4%)   -1.0% ( -10% -    9%) 0.557
            HighTermTitleBDVSort       15.80      (6.0%)       15.65      (5.0%)   -0.9% ( -11% -   10%) 0.587
                     OrStopWords       34.67      (2.9%)       34.45      (5.7%)   -0.6% (  -8% -    8%) 0.657
                    OrNotHighMed      385.71      (4.2%)      384.12      (3.2%)   -0.4% (  -7% -    7%) 0.725
                      TermDTSort      346.51      (5.7%)      345.26      (6.2%)   -0.4% ( -11% -   12%) 0.847
               HighTermTitleSort      153.13      (1.7%)      152.59      (3.3%)   -0.4% (  -5% -    4%) 0.670
                          OrMany       19.06      (1.6%)       18.99      (3.2%)   -0.3% (  -5% -    4%) 0.671
               HighTermMonthSort     3126.69      (2.9%)     3117.99      (3.7%)   -0.3% (  -6% -    6%) 0.791
                 CountOrHighHigh       50.32      (1.6%)       50.26      (2.1%)   -0.1% (  -3% -    3%) 0.862
                  CountOrHighMed      104.69      (1.7%)      104.70      (2.0%)    0.0% (  -3% -    3%) 0.981
                        PKLookup      270.86      (2.7%)      270.98      (2.7%)    0.0% (  -5% -    5%) 0.960
                      OrHighRare      281.93      (3.4%)      282.35      (4.8%)    0.1% (  -7% -    8%) 0.911
                        Wildcard       49.07      (3.7%)       49.15      (4.2%)    0.2% (  -7% -    8%) 0.893
              Or2Terms2StopWords      160.10      (1.5%)      160.52      (3.5%)    0.3% (  -4% -    5%) 0.756
             And2Terms2StopWords      156.75      (1.5%)      157.35      (2.8%)    0.4% (  -3% -    4%) 0.586
                       OrHighLow      855.65      (2.4%)      859.93      (2.7%)    0.5% (  -4% -    5%) 0.542
           HighTermDayOfYearSort      800.87      (2.8%)      805.06      (2.9%)    0.5% (  -5% -    6%) 0.562
                       And3Terms      169.90      (1.5%)      170.87      (3.1%)    0.6% (  -3% -    5%) 0.455
                          Fuzzy1       77.88      (3.3%)       78.52      (2.9%)    0.8% (  -5% -    7%) 0.409
                          Fuzzy2       73.27      (3.0%)       73.93      (2.4%)    0.9% (  -4% -    6%) 0.295
                    OrNotHighLow     1099.84      (3.7%)     1114.61      (3.8%)    1.3% (  -5% -    9%) 0.260
                        Or3Terms      169.45      (1.5%)      171.80      (3.7%)    1.4% (  -3% -    6%) 0.118
                 CountAndHighMed      148.89      (2.5%)      151.58      (3.0%)    1.8% (  -3% -    7%) 0.040
                         LowTerm     1033.62      (3.6%)     1052.61      (2.8%)    1.8% (  -4% -    8%) 0.075
                    OrHighNotMed      371.62      (3.1%)      378.74      (3.5%)    1.9% (  -4% -    8%) 0.066
                   OrHighNotHigh      296.15      (3.1%)      302.30      (3.1%)    2.1% (  -4% -    8%) 0.036
                     AndHighHigh       70.55      (1.6%)       72.20      (2.4%)    2.3% (  -1% -    6%) 0.000
                      OrHighHigh       94.03      (1.6%)       96.25      (2.0%)    2.4% (  -1% -    6%) 0.000
                    OrHighNotLow      442.74      (3.0%)      454.42      (3.6%)    2.6% (  -3% -    9%) 0.011
                       OrHighMed      232.09      (2.5%)      238.43      (2.5%)    2.7% (  -2% -    7%) 0.001
                          IntNRQ      110.25     (15.4%)      113.35     (17.9%)    2.8% ( -26% -   42%) 0.594
                         MedTerm      601.09      (3.7%)      619.19      (2.2%)    3.0% (  -2% -    9%) 0.002
                      AndHighMed      221.49      (1.9%)      228.33      (2.4%)    3.1% (  -1% -    7%) 0.000
                        HighTerm      520.52      (3.4%)      537.37      (2.6%)    3.2% (  -2% -    9%) 0.001
                      AndHighLow     1047.38      (2.8%)     1082.62      (2.7%)    3.4% (  -2% -    9%) 0.000
                   OrNotHighHigh      276.13      (3.5%)      286.23      (3.4%)    3.7% (  -3% -   10%) 0.001
                CountAndHighHigh       49.28      (2.3%)       54.98      (2.4%)   11.6% (   6% -   16%) 0.000

jpountz · 2024-10-25T21:41:02Z

I ran this PR on my Mac laptop (M3), where this gives a massive slowdown, I imagine because some of the vector operations I'm using are emulated. I need to find what to check against in order to avoid this like we did for vectors with PanamaVectorConstants.HAS_FAST_INTEGER_VECTORS.

rmuir · 2024-10-25T21:46:15Z

you are using VectorMask, only use this where implemented in HW (AVX-512 and ARM SVE).

rmuir · 2024-10-25T22:05:04Z

https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L280-L283

rmuir · 2024-10-25T22:08:18Z

For these uses of vectormask you are ok with AVX2 (so just use existing FAST_INTEGER_VECTORS check):

https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1597-L1603

So if you want to add this one without slowdowns: i would check: FAST_INTEGER_VECTORS && amd64

rmuir · 2024-10-25T22:38:13Z

maybe its a bug that it doesnt work on your mac either. because elsewhere they have code that looks like it is supposed to be doing this stuff: https://github.com/openjdk/jdk/blob/f1a9a8d25b2e1f9b5dbe8719abb66ec4cd9057dc/src/hotspot/cpu/aarch64/aarch64_vector_ad.m4#L3782

jpountz · 2024-10-27T21:18:04Z

I did more digging: vectorization actually worked on my Mac! So my best guess is that I got a ~20% slowdown because I only have 2 lanes on it, so the trueCount != LONG_SPECIES.length() is much less likely on my Mac than on my Linux desktop which has 4 lanes, and this hurts more than it helps compared to the naive linear scan. (See #13692 (comment) for some stats about how far advance() needs to go within a block.)

For now I disabled the optimization on machines which have less than 4 lanes, I'll try to run benchmarks on more CPUs to confirm it's not only helpful on my desktop CPU (AMD Ryzen 9 3900X).

jpountz · 2024-10-28T15:00:22Z

Here's a luceneutil/wikibigall run on the latest version of the code on my Linux desktop:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      271.24      (4.0%)      265.30      (5.2%)   -2.2% ( -11% -    7%) 0.139
                         Prefix3      187.84      (3.8%)      186.03      (4.6%)   -1.0% (  -9% -    7%) 0.470
               HighTermMonthSort     3167.67      (5.2%)     3137.73      (3.9%)   -0.9% (  -9% -    8%) 0.517
                        Wildcard       52.73      (5.1%)       52.48      (4.5%)   -0.5% (  -9% -    9%) 0.760
                  CountOrHighMed      106.16      (2.2%)      105.78      (2.0%)   -0.4% (  -4% -    3%) 0.590
                          Fuzzy2       72.80      (3.2%)       72.56      (4.2%)   -0.3% (  -7% -    7%) 0.780
                      OrHighRare      275.15      (5.4%)      274.93      (6.5%)   -0.1% ( -11% -   12%) 0.967
                 CountOrHighHigh       50.87      (2.1%)       50.84      (2.1%)   -0.0% (  -4% -    4%) 0.943
                          Fuzzy1       77.54      (2.5%)       77.67      (2.9%)    0.2% (  -5% -    5%) 0.844
            HighTermTitleBDVSort       19.93      (2.6%)       20.01      (2.6%)    0.4% (  -4% -    5%) 0.647
                          OrMany       19.08      (3.3%)       19.25      (3.1%)    0.9% (  -5% -    7%) 0.385
               HighTermTitleSort      152.92      (4.1%)      154.51      (2.7%)    1.0% (  -5% -    8%) 0.345
                       CountTerm     8423.54      (6.4%)     8527.55      (5.6%)    1.2% ( -10% -   14%) 0.516
              Or2Terms2StopWords      162.03      (2.4%)      164.89      (4.8%)    1.8% (  -5% -    9%) 0.140
                       OrHighLow      825.61      (3.5%)      843.30      (3.9%)    2.1% (  -5% -    9%) 0.066
                      TermDTSort      339.37      (8.5%)      347.18      (7.6%)    2.3% ( -12% -   20%) 0.364
                     OrStopWords       35.47      (3.0%)       36.35      (3.2%)    2.5% (  -3% -    8%) 0.012
                     AndHighHigh       79.10      (2.5%)       81.11      (2.9%)    2.5% (  -2% -    8%) 0.003
           HighTermDayOfYearSort      752.66      (5.6%)      772.27      (5.0%)    2.6% (  -7% -   14%) 0.121
             And2Terms2StopWords      158.28      (3.9%)      162.47      (3.8%)    2.6% (  -4% -   10%) 0.029
                    OrHighNotMed      398.06      (3.7%)      408.91      (3.4%)    2.7% (  -4% -   10%) 0.015
                    OrNotHighMed      362.36      (4.5%)      372.93      (4.5%)    2.9% (  -5% -   12%) 0.041
                        HighTerm      481.48      (3.3%)      495.74      (2.5%)    3.0% (  -2% -    9%) 0.001
                        Or3Terms      173.63      (1.8%)      178.80      (3.3%)    3.0% (  -2% -    8%) 0.000
                    AndStopWords       31.78      (4.4%)       32.73      (2.4%)    3.0% (  -3% -   10%) 0.007
                          IntNRQ      122.71     (14.1%)      126.45     (17.3%)    3.0% ( -24% -   40%) 0.542
                       And3Terms      173.42      (3.1%)      178.74      (2.1%)    3.1% (  -2% -    8%) 0.000
                   OrHighNotHigh      267.41      (3.9%)      275.76      (3.7%)    3.1% (  -4% -   11%) 0.009
                         LowTerm     1049.31      (4.4%)     1082.55      (4.7%)    3.2% (  -5% -   12%) 0.027
                         MedTerm      620.73      (4.0%)      640.91      (2.7%)    3.3% (  -3% -   10%) 0.003
                      AndHighMed      237.62      (3.1%)      245.94      (2.6%)    3.5% (  -2% -    9%) 0.000
                 CountAndHighMed      149.93      (3.8%)      155.31      (2.6%)    3.6% (  -2% -   10%) 0.000
                      OrHighHigh       63.84      (2.6%)       66.36      (3.6%)    3.9% (  -2% -   10%) 0.000
                    OrNotHighLow     1115.58      (3.6%)     1162.82      (4.4%)    4.2% (  -3% -   12%) 0.001
                      AndHighLow      993.72      (3.9%)     1036.31      (4.2%)    4.3% (  -3% -   12%) 0.001
                    OrHighNotLow      456.78      (3.9%)      476.92      (3.4%)    4.4% (  -2% -   12%) 0.000
                   OrNotHighHigh      273.36      (3.7%)      287.00      (3.4%)    5.0% (  -2% -   12%) 0.000
                       OrHighMed      243.28      (3.2%)      255.85      (3.3%)    5.2% (  -1% -   12%) 0.000
                CountAndHighHigh       49.56      (4.2%)       56.33      (2.1%)   13.6% (   7% -   20%) 0.000

jpountz · 2024-10-28T17:44:09Z

Here's wikimediumall on a c7i.2xlarge instance that supports AVX512:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ       28.57     (13.8%)       27.57     (14.7%)   -3.5% ( -28% -   28%) 0.437
            HighTermTitleBDVSort        4.59      (5.9%)        4.50      (3.8%)   -1.9% ( -10% -    8%) 0.212
                       OrHighMed      104.54      (3.5%)      102.60      (4.4%)   -1.9% (  -9% -    6%) 0.137
                 CountOrHighHigh       27.80      (4.6%)       27.40      (6.6%)   -1.4% ( -12% -   10%) 0.425
                       CountTerm     3894.27      (9.2%)     3851.05      (9.8%)   -1.1% ( -18% -   19%) 0.711
                    OrHighNotLow      466.67      (5.3%)      462.00      (6.2%)   -1.0% ( -11% -   11%) 0.582
                       OrHighLow      351.91      (4.0%)      348.61      (6.2%)   -0.9% ( -10% -    9%) 0.570
           HighTermDayOfYearSort      662.01      (4.6%)      657.47      (5.7%)   -0.7% ( -10% -   10%) 0.674
                    OrHighNotMed      398.60      (3.7%)      396.54      (6.5%)   -0.5% ( -10% -   10%) 0.759
                          OrMany        5.93      (4.9%)        5.91      (5.5%)   -0.4% ( -10% -   10%) 0.802
                  CountOrHighMed       56.55      (4.3%)       56.43      (4.6%)   -0.2% (  -8% -    9%) 0.877
                      OrHighRare      128.24      (4.4%)      128.13      (5.0%)   -0.1% (  -9% -    9%) 0.956
               HighTermMonthSort     1227.54      (5.3%)     1227.05      (5.9%)   -0.0% ( -10% -   11%) 0.982
                        Wildcard       63.11      (4.6%)       63.12      (5.1%)    0.0% (  -9% -   10%) 0.991
               HighTermTitleSort      102.24      (6.1%)      102.44      (5.0%)    0.2% ( -10% -   12%) 0.912
                    OrNotHighLow      375.52      (4.2%)      377.10      (5.8%)    0.4% (  -9% -   10%) 0.794
                         Prefix3      279.44      (4.1%)      280.66      (5.4%)    0.4% (  -8% -   10%) 0.773
                        Or3Terms       71.44      (3.5%)       71.86      (4.7%)    0.6% (  -7% -    9%) 0.654
                        HighTerm      520.73      (4.8%)      523.93      (5.7%)    0.6% (  -9% -   11%) 0.712
                        PKLookup      133.95      (3.7%)      134.86      (4.8%)    0.7% (  -7% -    9%) 0.619
              Or2Terms2StopWords       62.90      (3.7%)       63.49      (3.9%)    0.9% (  -6% -    8%) 0.431
                     OrStopWords       10.04     (10.1%)       10.15      (7.3%)    1.1% ( -14% -   20%) 0.693
                    OrNotHighMed      255.32      (5.0%)      258.31      (5.4%)    1.2% (  -8% -   12%) 0.479
                         LowTerm      466.43      (4.9%)      472.54      (5.9%)    1.3% (  -9% -   12%) 0.444
             And2Terms2StopWords       62.80      (3.6%)       63.92      (6.0%)    1.8% (  -7% -   11%) 0.251
                      TermDTSort      286.00      (3.2%)      291.35      (4.9%)    1.9% (  -6% -   10%) 0.157
                      AndHighLow      528.23      (4.0%)      538.88      (5.6%)    2.0% (  -7% -   12%) 0.191
                   OrHighNotHigh      275.41      (4.7%)      282.18      (6.1%)    2.5% (  -8% -   14%) 0.157
                      OrHighHigh       37.26      (7.6%)       38.33      (6.5%)    2.9% ( -10% -   18%) 0.201
                         MedTerm      509.40      (4.7%)      524.29      (6.6%)    2.9% (  -8% -   14%) 0.107
                    AndStopWords        8.97      (4.9%)        9.27      (6.5%)    3.4% (  -7% -   15%) 0.064
                      AndHighMed       85.68      (4.6%)       89.16      (5.8%)    4.1% (  -6% -   15%) 0.014
                       And3Terms       79.84      (3.8%)       83.53      (5.3%)    4.6% (  -4% -   14%) 0.001
                   OrNotHighHigh      256.91      (4.5%)      268.87      (6.6%)    4.7% (  -6% -   16%) 0.009
                     AndHighHigh       28.73      (6.2%)       31.71      (7.5%)   10.4% (  -3% -   25%) 0.000
                 CountAndHighMed       66.60      (3.9%)       78.40      (5.2%)   17.7% (   8% -   27%) 0.000
                CountAndHighHigh       16.96      (3.5%)       21.67      (6.1%)   27.8% (  17% -   38%) 0.000

jpountz · 2024-10-29T08:45:41Z

I plan on merging this change soon, and looking into moving postings back to int[] arrays next to hopefully get benefits from having 2x more lanes that can be compared at once.

jpountz · 2024-10-31T13:12:49Z

Nightly benchmarks just picked up the change with a mix of speedups and slowdowns: https://benchmarks.mikemccandless.com/2024.10.30.18.12.23.html. Here are the main ones I'm seeing:

Speedups:

CountAndHighHigh: +5%
AndHighHighDayTaxoFacets: +3%
CountAndHighMed: +2.5%

Slowdowns:

Phrase -3.5%
AndHighOrMedMed: -3%
OrHighRare: -3%
AndHighHigh: -3%
AndHighMed: -2.5%

I'm a bit surprised/disappointed at the AndHighHigh/AndHighMed slowdown since this change is supposed to help conjunctions, and the counting queries proved it helps. I'll look into it.

jpountz · 2024-10-31T14:34:43Z

If you check out data at #13692 (comment), AndHighHigh and AndHighMed tend to advance a bit further than CountAndHighHigh and CountAndHighMed, so that might be the issue. I am tempted to not touch anything yet and see how nightlies react to #13968, which should allow to check 2x more values at once.

PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks. This PR tries a different approach using vectorization. Experimentation suggests that it speeds up queries that advance to the next few doc IDs, such as `AndHighHigh`.

jpountz added this to the 10.1.0 milestone Oct 25, 2024

Specialize ImpactsDISI#nextDoc

7b0347e

Move from firstTrue() to trueCount(), seems faster

5e1bd65

jpountz added 2 commits October 27, 2024 21:37

Add JMH benchmark.

338a3bb

Only enable this optimization for 4+ lanes.

614b807

jpountz added 7 commits October 27, 2024 22:20

Nicer name.

76a96f7

Try different approach.

ab10198

Try different approach.

8bca401

Go back to initial approach

0281b24

Merge branch 'main' into speedup_advance_v2

e55fa7f

Finish renaming

b2561b9

New approach

1c8c76a

jpountz added 3 commits October 29, 2024 17:45

tidy

7e7b307

Merge branch 'main' into speedup_advance_v2

868d36b

Add CHANGES

694eeaf

jpountz merged commit 3041af7 into apache:main Oct 30, 2024
3 checks passed

jpountz deleted the speedup_advance_v2 branch October 30, 2024 11:51

gsmiller mentioned this pull request Oct 31, 2024

minor javadoc correction on VectorUtilSupport#findNextGEQ #13969

Closed

jpountz mentioned this pull request Nov 4, 2024

Move postings back to int[] to take advantage of having more lanes per vector. #13968

Merged

jpountz mentioned this pull request Jan 13, 2025

Explore within-block skipping for postings #12486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up advancing within a block, take 2. #13958

Speed up advancing within a block, take 2. #13958

jpountz commented Oct 25, 2024

jpountz commented Oct 25, 2024

jpountz commented Oct 25, 2024

jpountz commented Oct 25, 2024

rmuir commented Oct 25, 2024

rmuir commented Oct 25, 2024

rmuir commented Oct 25, 2024

rmuir commented Oct 25, 2024

jpountz commented Oct 27, 2024

jpountz commented Oct 28, 2024

jpountz commented Oct 28, 2024

jpountz commented Oct 29, 2024

jpountz commented Oct 31, 2024 •

edited

Loading

jpountz commented Oct 31, 2024

Speed up advancing within a block, take 2. #13958

Speed up advancing within a block, take 2. #13958

Conversation

jpountz commented Oct 25, 2024

jpountz commented Oct 25, 2024

jpountz commented Oct 25, 2024

jpountz commented Oct 25, 2024

rmuir commented Oct 25, 2024

rmuir commented Oct 25, 2024

rmuir commented Oct 25, 2024

rmuir commented Oct 25, 2024

jpountz commented Oct 27, 2024

jpountz commented Oct 28, 2024

jpountz commented Oct 28, 2024

jpountz commented Oct 29, 2024

jpountz commented Oct 31, 2024 • edited Loading

jpountz commented Oct 31, 2024

jpountz commented Oct 31, 2024 •

edited

Loading