Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up advancing within a block, take 2. #13958

Merged
merged 15 commits into from
Oct 30, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 25, 2024

PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks.

This PR tries a different approach using vectorization. Experimentation suggests that it slows down a bit queries when advancing often goes to the very next doc ID, such as term queries and OrHighNotXXX tasks. But it speeds up queries that advance to the next few doc IDs, such as AndHighHigh. I think that this is a good trade-off since it slows down some plenty fast queries in exchange for a speedup with some more expensive queries.

Here is a luceneutil run on wikibigall with -searchConcurrency 0:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   OrHighNotHigh      302.78      (2.4%)      283.75      (2.9%)   -6.3% ( -11% -   -1%) 0.000
                    OrHighNotMed      384.69      (3.0%)      363.33      (2.8%)   -5.6% ( -10% -    0%) 0.000
                         MedTerm      564.86      (2.2%)      537.04      (3.5%)   -4.9% ( -10% -    0%) 0.000
                         LowTerm     1014.02      (2.2%)      967.37      (3.6%)   -4.6% ( -10% -    1%) 0.000
                    OrHighNotLow      446.38      (3.4%)      427.10      (3.3%)   -4.3% ( -10% -    2%) 0.000
                        HighTerm      485.41      (1.9%)      464.49      (3.2%)   -4.3% (  -9% -    0%) 0.000
                   OrNotHighHigh      229.78      (2.4%)      221.51      (3.1%)   -3.6% (  -8% -    1%) 0.000
                    OrNotHighMed      396.63      (2.7%)      382.41      (3.1%)   -3.6% (  -9% -    2%) 0.000
                         Prefix3      145.65      (3.6%)      142.39      (3.7%)   -2.2% (  -9% -    5%) 0.051
                          IntNRQ      158.04      (4.7%)      154.77      (5.6%)   -2.1% ( -11% -    8%) 0.205
                       CountTerm     8320.96      (3.2%)     8198.56      (4.7%)   -1.5% (  -9% -    6%) 0.246
                        PKLookup      273.35      (3.6%)      269.71      (5.2%)   -1.3% (  -9% -    7%) 0.345
                        Wildcard       83.30      (3.4%)       82.28      (3.1%)   -1.2% (  -7% -    5%) 0.234
               HighTermMonthSort     3235.98      (3.1%)     3198.04      (2.9%)   -1.2% (  -6% -    4%) 0.215
               HighTermTitleSort      148.94      (2.5%)      148.38      (2.6%)   -0.4% (  -5% -    4%) 0.638
                  CountOrHighMed      104.51      (2.0%)      104.22      (1.7%)   -0.3% (  -3% -    3%) 0.640
            HighTermTitleBDVSort       14.67      (5.3%)       14.64      (5.9%)   -0.2% ( -10% -   11%) 0.899
                    AndStopWords       30.68      (3.0%)       30.66      (2.7%)   -0.1% (  -5% -    5%) 0.941
                 CountOrHighHigh       50.17      (2.0%)       50.19      (1.9%)    0.0% (  -3% -    3%) 0.947
                      OrHighRare      273.82      (4.5%)      273.96      (3.8%)    0.0% (  -7% -    8%) 0.971
                      TermDTSort      353.37      (6.4%)      354.23      (6.7%)    0.2% ( -12% -   14%) 0.907
                          Fuzzy1       77.85      (2.6%)       78.12      (2.0%)    0.3% (  -4% -    4%) 0.633
                          Fuzzy2       73.23      (2.5%)       73.50      (1.9%)    0.4% (  -3% -    4%) 0.594
           HighTermDayOfYearSort      836.62      (3.1%)      841.07      (4.0%)    0.5% (  -6% -    7%) 0.639
             And2Terms2StopWords      154.49      (1.8%)      155.41      (2.1%)    0.6% (  -3% -    4%) 0.340
                       OrHighLow      771.90      (2.0%)      778.20      (2.2%)    0.8% (  -3% -    5%) 0.217
                       And3Terms      167.63      (2.3%)      169.23      (2.2%)    1.0% (  -3% -    5%) 0.176
                     OrStopWords       33.99      (4.6%)       34.39      (4.1%)    1.2% (  -7% -   10%) 0.388
                 CountAndHighMed      148.01      (2.4%)      149.91      (1.0%)    1.3% (  -2% -    4%) 0.025
              Or2Terms2StopWords      156.93      (2.8%)      159.21      (3.0%)    1.5% (  -4% -    7%) 0.117
                     AndHighHigh       67.06      (1.3%)       68.07      (1.6%)    1.5% (  -1% -    4%) 0.001
                          OrMany       18.67      (2.9%)       18.96      (2.9%)    1.5% (  -4% -    7%) 0.089
                      AndHighMed      185.02      (1.6%)      189.06      (1.3%)    2.2% (   0% -    5%) 0.000
                      AndHighLow      948.34      (2.6%)      970.47      (2.6%)    2.3% (  -2% -    7%) 0.004
                      OrHighHigh       68.42      (1.4%)       70.08      (1.3%)    2.4% (   0% -    5%) 0.000
                        Or3Terms      166.47      (2.7%)      171.10      (3.1%)    2.8% (  -2% -    8%) 0.003
                    OrNotHighLow      964.69      (3.1%)      994.46      (3.3%)    3.1% (  -3% -    9%) 0.002
                       OrHighMed      222.32      (2.1%)      230.93      (1.5%)    3.9% (   0% -    7%) 0.000
                CountAndHighHigh       48.88      (2.4%)       52.87      (1.3%)    8.2% (   4% -   12%) 0.000

PR apache#13692 tried to speed up advancing by using branchless binary search, but
while this yielded a speedup on my machine, this yielded a slowdown on nightly
benchmarks.

This PR tries a different approach using vectorization. Experimentation
suggests that it slows down a bit queries when advancing often goes to the very
next doc ID, such as term queries and `OrHighNotXXX` tasks. But it speeds up
queries that advance to the next few doc IDs, such as `AndHighHigh`. I think
that this is a good trade-off since it slows down some plenty fast queries in
exchange for a speedup with some more expensive queries.

Here is a `luceneutil` run on `wikibigall` with `-searchConcurrency 0`:

```
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   OrHighNotHigh      302.78      (2.4%)      283.75      (2.9%)   -6.3% ( -11% -   -1%) 0.000
                    OrHighNotMed      384.69      (3.0%)      363.33      (2.8%)   -5.6% ( -10% -    0%) 0.000
                         MedTerm      564.86      (2.2%)      537.04      (3.5%)   -4.9% ( -10% -    0%) 0.000
                         LowTerm     1014.02      (2.2%)      967.37      (3.6%)   -4.6% ( -10% -    1%) 0.000
                    OrHighNotLow      446.38      (3.4%)      427.10      (3.3%)   -4.3% ( -10% -    2%) 0.000
                        HighTerm      485.41      (1.9%)      464.49      (3.2%)   -4.3% (  -9% -    0%) 0.000
                   OrNotHighHigh      229.78      (2.4%)      221.51      (3.1%)   -3.6% (  -8% -    1%) 0.000
                    OrNotHighMed      396.63      (2.7%)      382.41      (3.1%)   -3.6% (  -9% -    2%) 0.000
                         Prefix3      145.65      (3.6%)      142.39      (3.7%)   -2.2% (  -9% -    5%) 0.051
                          IntNRQ      158.04      (4.7%)      154.77      (5.6%)   -2.1% ( -11% -    8%) 0.205
                       CountTerm     8320.96      (3.2%)     8198.56      (4.7%)   -1.5% (  -9% -    6%) 0.246
                        PKLookup      273.35      (3.6%)      269.71      (5.2%)   -1.3% (  -9% -    7%) 0.345
                        Wildcard       83.30      (3.4%)       82.28      (3.1%)   -1.2% (  -7% -    5%) 0.234
               HighTermMonthSort     3235.98      (3.1%)     3198.04      (2.9%)   -1.2% (  -6% -    4%) 0.215
               HighTermTitleSort      148.94      (2.5%)      148.38      (2.6%)   -0.4% (  -5% -    4%) 0.638
                  CountOrHighMed      104.51      (2.0%)      104.22      (1.7%)   -0.3% (  -3% -    3%) 0.640
            HighTermTitleBDVSort       14.67      (5.3%)       14.64      (5.9%)   -0.2% ( -10% -   11%) 0.899
                    AndStopWords       30.68      (3.0%)       30.66      (2.7%)   -0.1% (  -5% -    5%) 0.941
                 CountOrHighHigh       50.17      (2.0%)       50.19      (1.9%)    0.0% (  -3% -    3%) 0.947
                      OrHighRare      273.82      (4.5%)      273.96      (3.8%)    0.0% (  -7% -    8%) 0.971
                      TermDTSort      353.37      (6.4%)      354.23      (6.7%)    0.2% ( -12% -   14%) 0.907
                          Fuzzy1       77.85      (2.6%)       78.12      (2.0%)    0.3% (  -4% -    4%) 0.633
                          Fuzzy2       73.23      (2.5%)       73.50      (1.9%)    0.4% (  -3% -    4%) 0.594
           HighTermDayOfYearSort      836.62      (3.1%)      841.07      (4.0%)    0.5% (  -6% -    7%) 0.639
             And2Terms2StopWords      154.49      (1.8%)      155.41      (2.1%)    0.6% (  -3% -    4%) 0.340
                       OrHighLow      771.90      (2.0%)      778.20      (2.2%)    0.8% (  -3% -    5%) 0.217
                       And3Terms      167.63      (2.3%)      169.23      (2.2%)    1.0% (  -3% -    5%) 0.176
                     OrStopWords       33.99      (4.6%)       34.39      (4.1%)    1.2% (  -7% -   10%) 0.388
                 CountAndHighMed      148.01      (2.4%)      149.91      (1.0%)    1.3% (  -2% -    4%) 0.025
              Or2Terms2StopWords      156.93      (2.8%)      159.21      (3.0%)    1.5% (  -4% -    7%) 0.117
                     AndHighHigh       67.06      (1.3%)       68.07      (1.6%)    1.5% (  -1% -    4%) 0.001
                          OrMany       18.67      (2.9%)       18.96      (2.9%)    1.5% (  -4% -    7%) 0.089
                      AndHighMed      185.02      (1.6%)      189.06      (1.3%)    2.2% (   0% -    5%) 0.000
                      AndHighLow      948.34      (2.6%)      970.47      (2.6%)    2.3% (  -2% -    7%) 0.004
                      OrHighHigh       68.42      (1.4%)       70.08      (1.3%)    2.4% (   0% -    5%) 0.000
                        Or3Terms      166.47      (2.7%)      171.10      (3.1%)    2.8% (  -2% -    8%) 0.003
                    OrNotHighLow      964.69      (3.1%)      994.46      (3.3%)    3.1% (  -3% -    9%) 0.002
                       OrHighMed      222.32      (2.1%)      230.93      (1.5%)    3.9% (   0% -    7%) 0.000
                CountAndHighHigh       48.88      (2.4%)       52.87      (1.3%)    8.2% (   4% -   12%) 0.000
```
@jpountz jpountz added this to the 10.1.0 milestone Oct 25, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Oct 25, 2024

Specializing ImpactsDISI#nextDoc() helped get rid of the slowdown:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    AndStopWords       31.34      (1.8%)       30.84      (4.0%)   -1.6% (  -7% -    4%) 0.105
                       CountTerm     8573.12      (3.8%)     8449.05      (4.7%)   -1.4% (  -9% -    7%) 0.284
                  CountOrHighMed      105.75      (2.1%)      104.50      (1.4%)   -1.2% (  -4% -    2%) 0.039
                      TermDTSort      363.06      (6.4%)      358.98      (6.6%)   -1.1% ( -13% -   12%) 0.585
                 CountOrHighHigh       50.62      (2.4%)       50.28      (1.7%)   -0.7% (  -4% -    3%) 0.305
                          IntNRQ      453.67      (4.7%)      451.13      (4.5%)   -0.6% (  -9% -    9%) 0.700
                      OrHighRare      283.32      (3.8%)      282.52      (3.8%)   -0.3% (  -7% -    7%) 0.813
                          Fuzzy1       78.58      (2.1%)       78.42      (3.0%)   -0.2% (  -5% -    5%) 0.812
           HighTermDayOfYearSort      850.86      (4.4%)      849.52      (3.0%)   -0.2% (  -7% -    7%) 0.895
            HighTermTitleBDVSort       13.97      (6.3%)       13.96      (5.5%)   -0.1% ( -11% -   12%) 0.974
             And2Terms2StopWords      157.31      (1.3%)      157.27      (2.2%)   -0.0% (  -3% -    3%) 0.965
                         LowTerm      985.67      (3.0%)      986.01      (1.8%)    0.0% (  -4% -    4%) 0.964
               HighTermMonthSort     3216.69      (2.2%)     3217.92      (3.9%)    0.0% (  -5% -    6%) 0.969
                          Fuzzy2       73.69      (2.0%)       73.74      (2.4%)    0.1% (  -4% -    4%) 0.910
                     AndHighHigh       65.88      (2.1%)       66.18      (2.0%)    0.5% (  -3% -    4%) 0.472
                       And3Terms      169.85      (2.0%)      170.81      (2.4%)    0.6% (  -3% -    5%) 0.424
                          OrMany       19.10      (1.7%)       19.22      (1.7%)    0.6% (  -2% -    4%) 0.237
              Or2Terms2StopWords      160.88      (1.4%)      161.91      (2.0%)    0.6% (  -2% -    4%) 0.241
                     OrStopWords       34.90      (1.4%)       35.15      (3.9%)    0.7% (  -4% -    6%) 0.450
                       OrHighLow      799.18      (1.6%)      805.33      (1.5%)    0.8% (  -2% -    3%) 0.117
                 CountAndHighMed      149.99      (3.1%)      151.23      (1.1%)    0.8% (  -3% -    5%) 0.261
                        Wildcard       88.47      (2.7%)       89.32      (3.2%)    1.0% (  -4% -    7%) 0.309
                        PKLookup      270.87      (3.8%)      273.47      (1.7%)    1.0% (  -4% -    6%) 0.307
                         Prefix3       93.00      (8.2%)       94.14      (6.3%)    1.2% ( -12% -   17%) 0.599
                         MedTerm      690.05      (2.6%)      701.55      (1.3%)    1.7% (  -2% -    5%) 0.010
                    OrHighNotMed      359.57      (2.7%)      366.02      (1.9%)    1.8% (  -2% -    6%) 0.014
                        Or3Terms      170.81      (1.3%)      173.98      (2.1%)    1.9% (  -1% -    5%) 0.001
                    OrHighNotLow      432.25      (3.4%)      440.76      (2.4%)    2.0% (  -3% -    8%) 0.035
               HighTermTitleSort      159.15      (4.8%)      162.44      (2.9%)    2.1% (  -5% -   10%) 0.096
                      AndHighMed      225.25      (2.6%)      229.93      (1.4%)    2.1% (  -1% -    6%) 0.002
                        HighTerm      455.45      (2.4%)      465.69      (2.1%)    2.2% (  -2% -    6%) 0.002
                      OrHighHigh       78.87      (1.5%)       80.64      (1.5%)    2.3% (   0% -    5%) 0.000
                   OrHighNotHigh      218.32      (2.7%)      224.10      (2.0%)    2.6% (  -2% -    7%) 0.000
                    OrNotHighLow     1111.11      (2.8%)     1144.28      (2.5%)    3.0% (  -2% -    8%) 0.000
                       OrHighMed      267.13      (1.8%)      275.57      (1.3%)    3.2% (   0% -    6%) 0.000
                    OrNotHighMed      303.24      (3.0%)      313.56      (2.5%)    3.4% (  -2% -    9%) 0.000
                   OrNotHighHigh      230.18      (2.8%)      238.62      (2.2%)    3.7% (  -1% -    8%) 0.000
                      AndHighLow      866.39      (2.7%)      903.54      (2.4%)    4.3% (   0% -    9%) 0.000
                CountAndHighHigh       49.60      (3.1%)       53.54      (0.9%)    7.9% (   3% -   12%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Oct 25, 2024

And I seem to be getting a better speedup by using trueCount() instead of firstTrue():

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                       CountTerm     8621.82      (5.6%)     8504.44      (4.6%)   -1.4% ( -10% -    9%) 0.401
                    AndStopWords       31.14      (1.4%)       30.83      (4.6%)   -1.0% (  -6% -    5%) 0.363
                         Prefix3       96.42      (5.7%)       95.50      (4.4%)   -1.0% ( -10% -    9%) 0.557
            HighTermTitleBDVSort       15.80      (6.0%)       15.65      (5.0%)   -0.9% ( -11% -   10%) 0.587
                     OrStopWords       34.67      (2.9%)       34.45      (5.7%)   -0.6% (  -8% -    8%) 0.657
                    OrNotHighMed      385.71      (4.2%)      384.12      (3.2%)   -0.4% (  -7% -    7%) 0.725
                      TermDTSort      346.51      (5.7%)      345.26      (6.2%)   -0.4% ( -11% -   12%) 0.847
               HighTermTitleSort      153.13      (1.7%)      152.59      (3.3%)   -0.4% (  -5% -    4%) 0.670
                          OrMany       19.06      (1.6%)       18.99      (3.2%)   -0.3% (  -5% -    4%) 0.671
               HighTermMonthSort     3126.69      (2.9%)     3117.99      (3.7%)   -0.3% (  -6% -    6%) 0.791
                 CountOrHighHigh       50.32      (1.6%)       50.26      (2.1%)   -0.1% (  -3% -    3%) 0.862
                  CountOrHighMed      104.69      (1.7%)      104.70      (2.0%)    0.0% (  -3% -    3%) 0.981
                        PKLookup      270.86      (2.7%)      270.98      (2.7%)    0.0% (  -5% -    5%) 0.960
                      OrHighRare      281.93      (3.4%)      282.35      (4.8%)    0.1% (  -7% -    8%) 0.911
                        Wildcard       49.07      (3.7%)       49.15      (4.2%)    0.2% (  -7% -    8%) 0.893
              Or2Terms2StopWords      160.10      (1.5%)      160.52      (3.5%)    0.3% (  -4% -    5%) 0.756
             And2Terms2StopWords      156.75      (1.5%)      157.35      (2.8%)    0.4% (  -3% -    4%) 0.586
                       OrHighLow      855.65      (2.4%)      859.93      (2.7%)    0.5% (  -4% -    5%) 0.542
           HighTermDayOfYearSort      800.87      (2.8%)      805.06      (2.9%)    0.5% (  -5% -    6%) 0.562
                       And3Terms      169.90      (1.5%)      170.87      (3.1%)    0.6% (  -3% -    5%) 0.455
                          Fuzzy1       77.88      (3.3%)       78.52      (2.9%)    0.8% (  -5% -    7%) 0.409
                          Fuzzy2       73.27      (3.0%)       73.93      (2.4%)    0.9% (  -4% -    6%) 0.295
                    OrNotHighLow     1099.84      (3.7%)     1114.61      (3.8%)    1.3% (  -5% -    9%) 0.260
                        Or3Terms      169.45      (1.5%)      171.80      (3.7%)    1.4% (  -3% -    6%) 0.118
                 CountAndHighMed      148.89      (2.5%)      151.58      (3.0%)    1.8% (  -3% -    7%) 0.040
                         LowTerm     1033.62      (3.6%)     1052.61      (2.8%)    1.8% (  -4% -    8%) 0.075
                    OrHighNotMed      371.62      (3.1%)      378.74      (3.5%)    1.9% (  -4% -    8%) 0.066
                   OrHighNotHigh      296.15      (3.1%)      302.30      (3.1%)    2.1% (  -4% -    8%) 0.036
                     AndHighHigh       70.55      (1.6%)       72.20      (2.4%)    2.3% (  -1% -    6%) 0.000
                      OrHighHigh       94.03      (1.6%)       96.25      (2.0%)    2.4% (  -1% -    6%) 0.000
                    OrHighNotLow      442.74      (3.0%)      454.42      (3.6%)    2.6% (  -3% -    9%) 0.011
                       OrHighMed      232.09      (2.5%)      238.43      (2.5%)    2.7% (  -2% -    7%) 0.001
                          IntNRQ      110.25     (15.4%)      113.35     (17.9%)    2.8% ( -26% -   42%) 0.594
                         MedTerm      601.09      (3.7%)      619.19      (2.2%)    3.0% (  -2% -    9%) 0.002
                      AndHighMed      221.49      (1.9%)      228.33      (2.4%)    3.1% (  -1% -    7%) 0.000
                        HighTerm      520.52      (3.4%)      537.37      (2.6%)    3.2% (  -2% -    9%) 0.001
                      AndHighLow     1047.38      (2.8%)     1082.62      (2.7%)    3.4% (  -2% -    9%) 0.000
                   OrNotHighHigh      276.13      (3.5%)      286.23      (3.4%)    3.7% (  -3% -   10%) 0.001
                CountAndHighHigh       49.28      (2.3%)       54.98      (2.4%)   11.6% (   6% -   16%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Oct 25, 2024

I ran this PR on my Mac laptop (M3), where this gives a massive slowdown, I imagine because some of the vector operations I'm using are emulated. I need to find what to check against in order to avoid this like we did for vectors with PanamaVectorConstants.HAS_FAST_INTEGER_VECTORS.

@rmuir
Copy link
Member

rmuir commented Oct 25, 2024

you are using VectorMask, only use this where implemented in HW (AVX-512 and ARM SVE).

@rmuir
Copy link
Member

rmuir commented Oct 25, 2024

For these uses of vectormask you are ok with AVX2 (so just use existing FAST_INTEGER_VECTORS check):

https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1597-L1603

So if you want to add this one without slowdowns: i would check: FAST_INTEGER_VECTORS && amd64

@rmuir
Copy link
Member

rmuir commented Oct 25, 2024

maybe its a bug that it doesnt work on your mac either. because elsewhere they have code that looks like it is supposed to be doing this stuff: https://github.com/openjdk/jdk/blob/f1a9a8d25b2e1f9b5dbe8719abb66ec4cd9057dc/src/hotspot/cpu/aarch64/aarch64_vector_ad.m4#L3782

@jpountz
Copy link
Contributor Author

jpountz commented Oct 27, 2024

I did more digging: vectorization actually worked on my Mac! So my best guess is that I got a ~20% slowdown because I only have 2 lanes on it, so the trueCount != LONG_SPECIES.length() is much less likely on my Mac than on my Linux desktop which has 4 lanes, and this hurts more than it helps compared to the naive linear scan. (See #13692 (comment) for some stats about how far advance() needs to go within a block.)

For now I disabled the optimization on machines which have less than 4 lanes, I'll try to run benchmarks on more CPUs to confirm it's not only helpful on my desktop CPU (AMD Ryzen 9 3900X).

@jpountz
Copy link
Contributor Author

jpountz commented Oct 28, 2024

Here's a luceneutil/wikibigall run on the latest version of the code on my Linux desktop:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      271.24      (4.0%)      265.30      (5.2%)   -2.2% ( -11% -    7%) 0.139
                         Prefix3      187.84      (3.8%)      186.03      (4.6%)   -1.0% (  -9% -    7%) 0.470
               HighTermMonthSort     3167.67      (5.2%)     3137.73      (3.9%)   -0.9% (  -9% -    8%) 0.517
                        Wildcard       52.73      (5.1%)       52.48      (4.5%)   -0.5% (  -9% -    9%) 0.760
                  CountOrHighMed      106.16      (2.2%)      105.78      (2.0%)   -0.4% (  -4% -    3%) 0.590
                          Fuzzy2       72.80      (3.2%)       72.56      (4.2%)   -0.3% (  -7% -    7%) 0.780
                      OrHighRare      275.15      (5.4%)      274.93      (6.5%)   -0.1% ( -11% -   12%) 0.967
                 CountOrHighHigh       50.87      (2.1%)       50.84      (2.1%)   -0.0% (  -4% -    4%) 0.943
                          Fuzzy1       77.54      (2.5%)       77.67      (2.9%)    0.2% (  -5% -    5%) 0.844
            HighTermTitleBDVSort       19.93      (2.6%)       20.01      (2.6%)    0.4% (  -4% -    5%) 0.647
                          OrMany       19.08      (3.3%)       19.25      (3.1%)    0.9% (  -5% -    7%) 0.385
               HighTermTitleSort      152.92      (4.1%)      154.51      (2.7%)    1.0% (  -5% -    8%) 0.345
                       CountTerm     8423.54      (6.4%)     8527.55      (5.6%)    1.2% ( -10% -   14%) 0.516
              Or2Terms2StopWords      162.03      (2.4%)      164.89      (4.8%)    1.8% (  -5% -    9%) 0.140
                       OrHighLow      825.61      (3.5%)      843.30      (3.9%)    2.1% (  -5% -    9%) 0.066
                      TermDTSort      339.37      (8.5%)      347.18      (7.6%)    2.3% ( -12% -   20%) 0.364
                     OrStopWords       35.47      (3.0%)       36.35      (3.2%)    2.5% (  -3% -    8%) 0.012
                     AndHighHigh       79.10      (2.5%)       81.11      (2.9%)    2.5% (  -2% -    8%) 0.003
           HighTermDayOfYearSort      752.66      (5.6%)      772.27      (5.0%)    2.6% (  -7% -   14%) 0.121
             And2Terms2StopWords      158.28      (3.9%)      162.47      (3.8%)    2.6% (  -4% -   10%) 0.029
                    OrHighNotMed      398.06      (3.7%)      408.91      (3.4%)    2.7% (  -4% -   10%) 0.015
                    OrNotHighMed      362.36      (4.5%)      372.93      (4.5%)    2.9% (  -5% -   12%) 0.041
                        HighTerm      481.48      (3.3%)      495.74      (2.5%)    3.0% (  -2% -    9%) 0.001
                        Or3Terms      173.63      (1.8%)      178.80      (3.3%)    3.0% (  -2% -    8%) 0.000
                    AndStopWords       31.78      (4.4%)       32.73      (2.4%)    3.0% (  -3% -   10%) 0.007
                          IntNRQ      122.71     (14.1%)      126.45     (17.3%)    3.0% ( -24% -   40%) 0.542
                       And3Terms      173.42      (3.1%)      178.74      (2.1%)    3.1% (  -2% -    8%) 0.000
                   OrHighNotHigh      267.41      (3.9%)      275.76      (3.7%)    3.1% (  -4% -   11%) 0.009
                         LowTerm     1049.31      (4.4%)     1082.55      (4.7%)    3.2% (  -5% -   12%) 0.027
                         MedTerm      620.73      (4.0%)      640.91      (2.7%)    3.3% (  -3% -   10%) 0.003
                      AndHighMed      237.62      (3.1%)      245.94      (2.6%)    3.5% (  -2% -    9%) 0.000
                 CountAndHighMed      149.93      (3.8%)      155.31      (2.6%)    3.6% (  -2% -   10%) 0.000
                      OrHighHigh       63.84      (2.6%)       66.36      (3.6%)    3.9% (  -2% -   10%) 0.000
                    OrNotHighLow     1115.58      (3.6%)     1162.82      (4.4%)    4.2% (  -3% -   12%) 0.001
                      AndHighLow      993.72      (3.9%)     1036.31      (4.2%)    4.3% (  -3% -   12%) 0.001
                    OrHighNotLow      456.78      (3.9%)      476.92      (3.4%)    4.4% (  -2% -   12%) 0.000
                   OrNotHighHigh      273.36      (3.7%)      287.00      (3.4%)    5.0% (  -2% -   12%) 0.000
                       OrHighMed      243.28      (3.2%)      255.85      (3.3%)    5.2% (  -1% -   12%) 0.000
                CountAndHighHigh       49.56      (4.2%)       56.33      (2.1%)   13.6% (   7% -   20%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Oct 28, 2024

Here's wikimediumall on a c7i.2xlarge instance that supports AVX512:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ       28.57     (13.8%)       27.57     (14.7%)   -3.5% ( -28% -   28%) 0.437
            HighTermTitleBDVSort        4.59      (5.9%)        4.50      (3.8%)   -1.9% ( -10% -    8%) 0.212
                       OrHighMed      104.54      (3.5%)      102.60      (4.4%)   -1.9% (  -9% -    6%) 0.137
                 CountOrHighHigh       27.80      (4.6%)       27.40      (6.6%)   -1.4% ( -12% -   10%) 0.425
                       CountTerm     3894.27      (9.2%)     3851.05      (9.8%)   -1.1% ( -18% -   19%) 0.711
                    OrHighNotLow      466.67      (5.3%)      462.00      (6.2%)   -1.0% ( -11% -   11%) 0.582
                       OrHighLow      351.91      (4.0%)      348.61      (6.2%)   -0.9% ( -10% -    9%) 0.570
           HighTermDayOfYearSort      662.01      (4.6%)      657.47      (5.7%)   -0.7% ( -10% -   10%) 0.674
                    OrHighNotMed      398.60      (3.7%)      396.54      (6.5%)   -0.5% ( -10% -   10%) 0.759
                          OrMany        5.93      (4.9%)        5.91      (5.5%)   -0.4% ( -10% -   10%) 0.802
                  CountOrHighMed       56.55      (4.3%)       56.43      (4.6%)   -0.2% (  -8% -    9%) 0.877
                      OrHighRare      128.24      (4.4%)      128.13      (5.0%)   -0.1% (  -9% -    9%) 0.956
               HighTermMonthSort     1227.54      (5.3%)     1227.05      (5.9%)   -0.0% ( -10% -   11%) 0.982
                        Wildcard       63.11      (4.6%)       63.12      (5.1%)    0.0% (  -9% -   10%) 0.991
               HighTermTitleSort      102.24      (6.1%)      102.44      (5.0%)    0.2% ( -10% -   12%) 0.912
                    OrNotHighLow      375.52      (4.2%)      377.10      (5.8%)    0.4% (  -9% -   10%) 0.794
                         Prefix3      279.44      (4.1%)      280.66      (5.4%)    0.4% (  -8% -   10%) 0.773
                        Or3Terms       71.44      (3.5%)       71.86      (4.7%)    0.6% (  -7% -    9%) 0.654
                        HighTerm      520.73      (4.8%)      523.93      (5.7%)    0.6% (  -9% -   11%) 0.712
                        PKLookup      133.95      (3.7%)      134.86      (4.8%)    0.7% (  -7% -    9%) 0.619
              Or2Terms2StopWords       62.90      (3.7%)       63.49      (3.9%)    0.9% (  -6% -    8%) 0.431
                     OrStopWords       10.04     (10.1%)       10.15      (7.3%)    1.1% ( -14% -   20%) 0.693
                    OrNotHighMed      255.32      (5.0%)      258.31      (5.4%)    1.2% (  -8% -   12%) 0.479
                         LowTerm      466.43      (4.9%)      472.54      (5.9%)    1.3% (  -9% -   12%) 0.444
             And2Terms2StopWords       62.80      (3.6%)       63.92      (6.0%)    1.8% (  -7% -   11%) 0.251
                      TermDTSort      286.00      (3.2%)      291.35      (4.9%)    1.9% (  -6% -   10%) 0.157
                      AndHighLow      528.23      (4.0%)      538.88      (5.6%)    2.0% (  -7% -   12%) 0.191
                   OrHighNotHigh      275.41      (4.7%)      282.18      (6.1%)    2.5% (  -8% -   14%) 0.157
                      OrHighHigh       37.26      (7.6%)       38.33      (6.5%)    2.9% ( -10% -   18%) 0.201
                         MedTerm      509.40      (4.7%)      524.29      (6.6%)    2.9% (  -8% -   14%) 0.107
                    AndStopWords        8.97      (4.9%)        9.27      (6.5%)    3.4% (  -7% -   15%) 0.064
                      AndHighMed       85.68      (4.6%)       89.16      (5.8%)    4.1% (  -6% -   15%) 0.014
                       And3Terms       79.84      (3.8%)       83.53      (5.3%)    4.6% (  -4% -   14%) 0.001
                   OrNotHighHigh      256.91      (4.5%)      268.87      (6.6%)    4.7% (  -6% -   16%) 0.009
                     AndHighHigh       28.73      (6.2%)       31.71      (7.5%)   10.4% (  -3% -   25%) 0.000
                 CountAndHighMed       66.60      (3.9%)       78.40      (5.2%)   17.7% (   8% -   27%) 0.000
                CountAndHighHigh       16.96      (3.5%)       21.67      (6.1%)   27.8% (  17% -   38%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Oct 29, 2024

I plan on merging this change soon, and looking into moving postings back to int[] arrays next to hopefully get benefits from having 2x more lanes that can be compared at once.

@jpountz jpountz merged commit 3041af7 into apache:main Oct 30, 2024
3 checks passed
@jpountz jpountz deleted the speedup_advance_v2 branch October 30, 2024 11:51
@jpountz
Copy link
Contributor Author

jpountz commented Oct 31, 2024

Nightly benchmarks just picked up the change with a mix of speedups and slowdowns: https://benchmarks.mikemccandless.com/2024.10.30.18.12.23.html. Here are the main ones I'm seeing:

Speedups:

  • CountAndHighHigh: +5%
  • AndHighHighDayTaxoFacets: +3%
  • CountAndHighMed: +2.5%

Slowdowns:

  • Phrase -3.5%
  • AndHighOrMedMed: -3%
  • OrHighRare: -3%
  • AndHighHigh: -3%
  • AndHighMed: -2.5%

I'm a bit surprised/disappointed at the AndHighHigh/AndHighMed slowdown since this change is supposed to help conjunctions, and the counting queries proved it helps. I'll look into it.

@jpountz
Copy link
Contributor Author

jpountz commented Oct 31, 2024

If you check out data at #13692 (comment), AndHighHigh and AndHighMed tend to advance a bit further than CountAndHighHigh and CountAndHighMed, so that might be the issue. I am tempted to not touch anything yet and see how nightlies react to #13968, which should allow to check 2x more values at once.

jpountz added a commit that referenced this pull request Oct 31, 2024
PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks.

This PR tries a different approach using vectorization. Experimentation suggests that it speeds up queries that advance to the next few doc IDs, such as `AndHighHigh`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants