Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move postings back to int[] to take advantage of having more lanes per vector. #13968

Merged
merged 6 commits into from
Nov 1, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 31, 2024

In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we would like to handle) support in the JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more benefits from the ability to compare multiple intetgers in one operations than from summing up two integers in one operation. Moving back to ints helps compare 2x more integers at once vs. longs.

The diff is large because of the codec dance: Lucene912PostingsFormat and Lucene100Codec moved to lucene/backward-codecs and a new Lucene101PostingsFormat is a copy of the previous Lucene912PostingsFormat with a move from long[] arrays to int[] arrays, and changes to the on-disk format for blocks of packed integers.

Note that DataInput#readGroupVInt and VectorUtilSupport#findNextGEQ have been cleaned up to only support int[] and no longer long[].

In Lucene 8.4, we updated postings to work on long[] arrays internally. This
allowed us to workaround the lack of explicit vectorization (auto-vectorization
doesn't detect all the scenarios that we would like to handle) support in the
JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more
benefits from the ability to compare multiple intetgers in one operations than
from summing up two integers in one operation. Moving back to ints helps
compare 2x more integers at once vs. longs.
@jpountz jpountz added this to the 10.1.0 milestone Oct 31, 2024
@jpountz jpountz changed the title Move postings back to int[]. Move postings back to int[] to take advantage of having more lanes per vector. Oct 31, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Oct 31, 2024

Here is a luceneutil run against wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CountOrHighHigh       76.20      (1.0%)       74.54      (0.9%)   -2.2% (  -4% -    0%) 0.000
                  CountOrHighMed      143.11      (1.2%)      141.35      (1.0%)   -1.2% (  -3% -    0%) 0.001
                CountAndHighHigh       57.54      (1.1%)       57.30      (0.9%)   -0.4% (  -2% -    1%) 0.189
                      TermDTSort      376.04      (6.4%)      374.67      (5.7%)   -0.4% ( -11% -   12%) 0.853
                     AndHighHigh       91.53      (1.4%)       91.32      (1.9%)   -0.2% (  -3% -    3%) 0.669
           HighTermDayOfYearSort      901.22      (3.8%)      899.77      (3.4%)   -0.2% (  -7% -    7%) 0.890
                      AndHighLow     1205.47      (1.7%)     1203.76      (2.0%)   -0.1% (  -3% -    3%) 0.810
                       OrHighMed      206.20      (2.5%)      206.34      (2.7%)    0.1% (  -5% -    5%) 0.935
                    OrNotHighLow     1148.24      (2.1%)     1149.12      (2.0%)    0.1% (  -3% -    4%) 0.908
                       OrHighLow      756.64      (1.7%)      757.29      (1.6%)    0.1% (  -3% -    3%) 0.872
                         MedTerm      742.62      (2.1%)      743.99      (2.2%)    0.2% (  -4% -    4%) 0.793
                      AndHighMed      184.06      (1.4%)      184.45      (1.3%)    0.2% (  -2% -    2%) 0.622
                        PKLookup      271.47      (2.2%)      272.19      (2.5%)    0.3% (  -4% -    5%) 0.728
                      OrHighRare      280.73      (4.2%)      281.50      (5.4%)    0.3% (  -8% -   10%) 0.861
                   OrHighNotHigh      262.74      (2.8%)      263.51      (2.6%)    0.3% (  -4% -    5%) 0.736
                    AndStopWords       32.53      (4.5%)       32.64      (4.2%)    0.3% (  -8% -    9%) 0.806
                        HighTerm      443.84      (2.8%)      445.95      (2.1%)    0.5% (  -4% -    5%) 0.551
                    OrHighNotMed      485.11      (3.0%)      487.66      (3.4%)    0.5% (  -5% -    7%) 0.612
                         LowTerm     1138.20      (2.7%)     1144.70      (2.8%)    0.6% (  -4% -    6%) 0.519
                          Fuzzy2       75.60      (2.1%)       76.05      (2.5%)    0.6% (  -3% -    5%) 0.429
                        Wildcard      116.11      (3.2%)      116.86      (4.1%)    0.6% (  -6% -    8%) 0.597
                      OrHighHigh       93.59      (3.6%)       94.24      (3.4%)    0.7% (  -6% -    7%) 0.538
                   OrNotHighHigh      261.04      (2.8%)      262.93      (2.4%)    0.7% (  -4% -    6%) 0.396
                          Fuzzy1       80.27      (2.6%)       80.86      (2.6%)    0.7% (  -4% -    6%) 0.381
             And2Terms2StopWords      161.95      (2.6%)      163.18      (2.5%)    0.8% (  -4% -    5%) 0.354
            HighTermTitleBDVSort       15.67      (6.6%)       15.80      (5.8%)    0.8% ( -10% -   14%) 0.677
                    OrHighNotLow      447.52      (4.0%)      451.75      (4.1%)    0.9% (  -6% -    9%) 0.471
                       And3Terms      178.15      (3.2%)      179.88      (2.8%)    1.0% (  -4% -    7%) 0.319
              Or2Terms2StopWords      164.13      (3.7%)      166.04      (3.4%)    1.2% (  -5% -    8%) 0.312
                     OrStopWords       36.12      (6.7%)       36.55      (6.1%)    1.2% ( -10% -   14%) 0.564
                        Or3Terms      178.00      (3.7%)      180.14      (3.5%)    1.2% (  -5% -    8%) 0.309
                         Prefix3       70.94      (4.1%)       71.81      (8.1%)    1.2% ( -10% -   13%) 0.554
                          IntNRQ      179.05      (5.1%)      181.32      (5.4%)    1.3% (  -8% -   12%) 0.459
               HighTermMonthSort     3413.39      (2.2%)     3459.32      (3.0%)    1.3% (  -3% -    6%) 0.111
                    OrNotHighMed      384.09      (3.2%)      389.69      (2.5%)    1.5% (  -4% -    7%) 0.112
                          OrMany       19.16      (3.5%)       19.44      (3.6%)    1.5% (  -5% -    8%) 0.203
                       CountTerm     9388.28      (3.3%)     9587.31      (4.2%)    2.1% (  -5% -    9%) 0.082
               HighTermTitleSort      135.48      (1.9%)      139.76      (3.3%)    3.2% (  -1% -    8%) 0.000
                 CountAndHighMed      160.02      (1.3%)      168.58      (1.3%)    5.4% (   2% -    7%) 0.000

The CountAndHighMed and HighTermTitleSort speedups are consistently reproducible. I believe that the former is due to being able to compare 8 lanes at once instead of 4, and the latter is due to the better memory efficiency of the postings reader now that it stores two int[128] instead of two long[128] (this task creates a few tens of postings readers under the hood).

The CountOrHighHigh and CountOrHighMed slowdowns are not consistently reproducible, but they may be real as we lost an optimization to prefix sums without getting new benefits in exchange.

The simplicity and memory efficiency of working with int[] instead of long[], coupled with the speedup to advancing make this change a good trade-off in my opinion.

@jpountz
Copy link
Contributor Author

jpountz commented Oct 31, 2024

I plan on merging tomorrow, so that we have two data points with longs on nightly benchmarks before seeing how it performs with ints.

Copy link
Contributor

@gsmiller gsmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. Tried to look through the actual changes (and not all the codec wiring stuff).

* cIndex}.
* </ul>
*/
public void splitInts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop #splitLongs? (Also, should we add @lucene.internal to this class so we're free to drop public methods?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching, I had meant to do it but missed some bits obviously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this class may only be used from a very small set of explicitly named classes, see org.apache.lucene.internal.vectorization.VectorizationProvider#VALID_CALLERS, so there is no risk that users use this API.

@jpountz jpountz merged commit cfdd20f into apache:main Nov 1, 2024
3 checks passed
@jpountz jpountz deleted the move_postings_back_to_int_arrays branch November 1, 2024 06:49
@jpountz
Copy link
Contributor Author

jpountz commented Nov 4, 2024

Nightly benchmarks confirmed a speedup from the combination of #13958 (SIMD for advancing within a block) and this PR. (CountAndHighHigh, CountAndHighMed)

Exhaustive evaluation of disjunctive queries looks stable or barely slower (CountOrHighHigh, CountOrHighMed) while we could have expected a slowdown from the loss of some optimizations in block decoding.

jpountz added a commit that referenced this pull request Nov 4, 2024
…r vector. (#13968)

In Lucene 8.4, we updated postings to work on long[] arrays internally. This
allowed us to workaround the lack of explicit vectorization (auto-vectorization
doesn't detect all the scenarios that we would like to handle) support in the
JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more
benefits from the ability to compare multiple intetgers in one operations than
from summing up two integers in one operation. Moving back to ints helps
compare 2x more integers at once vs. longs.
javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 6, 2024
javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 6, 2024
javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 6, 2024
javanna added a commit to elastic/elasticsearch that referenced this pull request Nov 8, 2024
benchaplin pushed a commit to benchaplin/lucene that referenced this pull request Dec 31, 2024
…r vector. (apache#13968)

In Lucene 8.4, we updated postings to work on long[] arrays internally. This
allowed us to workaround the lack of explicit vectorization (auto-vectorization
doesn't detect all the scenarios that we would like to handle) support in the
JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more
benefits from the ability to compare multiple intetgers in one operations than
from summing up two integers in one operation. Moving back to ints helps
compare 2x more integers at once vs. longs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants