Move postings back to int[] to take advantage of having more lanes per vector. #13968

jpountz · 2024-10-31T09:32:53Z

In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we would like to handle) support in the JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more benefits from the ability to compare multiple intetgers in one operations than from summing up two integers in one operation. Moving back to ints helps compare 2x more integers at once vs. longs.

The diff is large because of the codec dance: Lucene912PostingsFormat and Lucene100Codec moved to lucene/backward-codecs and a new Lucene101PostingsFormat is a copy of the previous Lucene912PostingsFormat with a move from long[] arrays to int[] arrays, and changes to the on-disk format for blocks of packed integers.

Note that DataInput#readGroupVInt and VectorUtilSupport#findNextGEQ have been cleaned up to only support int[] and no longer long[].

In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we would like to handle) support in the JVM by summing up two integers in one operation for instance. With explicit vectorization now available, it looks like we can get more benefits from the ability to compare multiple intetgers in one operations than from summing up two integers in one operation. Moving back to ints helps compare 2x more integers at once vs. longs.

jpountz · 2024-10-31T09:54:25Z

Here is a luceneutil run against wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CountOrHighHigh       76.20      (1.0%)       74.54      (0.9%)   -2.2% (  -4% -    0%) 0.000
                  CountOrHighMed      143.11      (1.2%)      141.35      (1.0%)   -1.2% (  -3% -    0%) 0.001
                CountAndHighHigh       57.54      (1.1%)       57.30      (0.9%)   -0.4% (  -2% -    1%) 0.189
                      TermDTSort      376.04      (6.4%)      374.67      (5.7%)   -0.4% ( -11% -   12%) 0.853
                     AndHighHigh       91.53      (1.4%)       91.32      (1.9%)   -0.2% (  -3% -    3%) 0.669
           HighTermDayOfYearSort      901.22      (3.8%)      899.77      (3.4%)   -0.2% (  -7% -    7%) 0.890
                      AndHighLow     1205.47      (1.7%)     1203.76      (2.0%)   -0.1% (  -3% -    3%) 0.810
                       OrHighMed      206.20      (2.5%)      206.34      (2.7%)    0.1% (  -5% -    5%) 0.935
                    OrNotHighLow     1148.24      (2.1%)     1149.12      (2.0%)    0.1% (  -3% -    4%) 0.908
                       OrHighLow      756.64      (1.7%)      757.29      (1.6%)    0.1% (  -3% -    3%) 0.872
                         MedTerm      742.62      (2.1%)      743.99      (2.2%)    0.2% (  -4% -    4%) 0.793
                      AndHighMed      184.06      (1.4%)      184.45      (1.3%)    0.2% (  -2% -    2%) 0.622
                        PKLookup      271.47      (2.2%)      272.19      (2.5%)    0.3% (  -4% -    5%) 0.728
                      OrHighRare      280.73      (4.2%)      281.50      (5.4%)    0.3% (  -8% -   10%) 0.861
                   OrHighNotHigh      262.74      (2.8%)      263.51      (2.6%)    0.3% (  -4% -    5%) 0.736
                    AndStopWords       32.53      (4.5%)       32.64      (4.2%)    0.3% (  -8% -    9%) 0.806
                        HighTerm      443.84      (2.8%)      445.95      (2.1%)    0.5% (  -4% -    5%) 0.551
                    OrHighNotMed      485.11      (3.0%)      487.66      (3.4%)    0.5% (  -5% -    7%) 0.612
                         LowTerm     1138.20      (2.7%)     1144.70      (2.8%)    0.6% (  -4% -    6%) 0.519
                          Fuzzy2       75.60      (2.1%)       76.05      (2.5%)    0.6% (  -3% -    5%) 0.429
                        Wildcard      116.11      (3.2%)      116.86      (4.1%)    0.6% (  -6% -    8%) 0.597
                      OrHighHigh       93.59      (3.6%)       94.24      (3.4%)    0.7% (  -6% -    7%) 0.538
                   OrNotHighHigh      261.04      (2.8%)      262.93      (2.4%)    0.7% (  -4% -    6%) 0.396
                          Fuzzy1       80.27      (2.6%)       80.86      (2.6%)    0.7% (  -4% -    6%) 0.381
             And2Terms2StopWords      161.95      (2.6%)      163.18      (2.5%)    0.8% (  -4% -    5%) 0.354
            HighTermTitleBDVSort       15.67      (6.6%)       15.80      (5.8%)    0.8% ( -10% -   14%) 0.677
                    OrHighNotLow      447.52      (4.0%)      451.75      (4.1%)    0.9% (  -6% -    9%) 0.471
                       And3Terms      178.15      (3.2%)      179.88      (2.8%)    1.0% (  -4% -    7%) 0.319
              Or2Terms2StopWords      164.13      (3.7%)      166.04      (3.4%)    1.2% (  -5% -    8%) 0.312
                     OrStopWords       36.12      (6.7%)       36.55      (6.1%)    1.2% ( -10% -   14%) 0.564
                        Or3Terms      178.00      (3.7%)      180.14      (3.5%)    1.2% (  -5% -    8%) 0.309
                         Prefix3       70.94      (4.1%)       71.81      (8.1%)    1.2% ( -10% -   13%) 0.554
                          IntNRQ      179.05      (5.1%)      181.32      (5.4%)    1.3% (  -8% -   12%) 0.459
               HighTermMonthSort     3413.39      (2.2%)     3459.32      (3.0%)    1.3% (  -3% -    6%) 0.111
                    OrNotHighMed      384.09      (3.2%)      389.69      (2.5%)    1.5% (  -4% -    7%) 0.112
                          OrMany       19.16      (3.5%)       19.44      (3.6%)    1.5% (  -5% -    8%) 0.203
                       CountTerm     9388.28      (3.3%)     9587.31      (4.2%)    2.1% (  -5% -    9%) 0.082
               HighTermTitleSort      135.48      (1.9%)      139.76      (3.3%)    3.2% (  -1% -    8%) 0.000
                 CountAndHighMed      160.02      (1.3%)      168.58      (1.3%)    5.4% (   2% -    7%) 0.000

The CountAndHighMed and HighTermTitleSort speedups are consistently reproducible. I believe that the former is due to being able to compare 8 lanes at once instead of 4, and the latter is due to the better memory efficiency of the postings reader now that it stores two int[128] instead of two long[128] (this task creates a few tens of postings readers under the hood).

The CountOrHighHigh and CountOrHighMed slowdowns are not consistently reproducible, but they may be real as we lost an optimization to prefix sums without getting new benefits in exchange.

The simplicity and memory efficiency of working with int[] instead of long[], coupled with the speedup to advancing make this change a good trade-off in my opinion.

…ctorizationProvider.

jpountz · 2024-10-31T16:10:23Z

I plan on merging tomorrow, so that we have two data points with longs on nightly benchmarks before seeing how it performs with ints.

gsmiller

Seems good. Tried to look through the actual changes (and not all the codec wiring stuff).

gsmiller · 2024-10-31T16:32:22Z

lucene/core/src/java/org/apache/lucene/internal/vectorization/PostingDecodingUtil.java

+   *       cIndex}.
+   * </ul>
+   */
+  public void splitInts(


Should we drop #splitLongs? (Also, should we add @lucene.internal to this class so we're free to drop public methods?)

Thanks for catching, I had meant to do it but missed some bits obviously.

FWIW this class may only be used from a very small set of explicitly named classes, see org.apache.lucene.internal.vectorization.VectorizationProvider#VALID_CALLERS, so there is no risk that users use this API.

jpountz · 2024-11-04T08:22:04Z

Nightly benchmarks confirmed a speedup from the combination of #13958 (SIMD for advancing within a block) and this PR. (CountAndHighHigh, CountAndHighMed)

Exhaustive evaluation of disjunctive queries looks stable or barely slower (CountOrHighHigh, CountOrHighMed) while we could have expected a slowdown from the loss of some optimizations in block decoding.

…r vector. (#13968) In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we would like to handle) support in the JVM by summing up two integers in one operation for instance. With explicit vectorization now available, it looks like we can get more benefits from the ability to compare multiple intetgers in one operations than from summing up two integers in one operation. Moving back to ints helps compare 2x more integers at once vs. longs.

See apache/lucene#13968

…r vector. (apache#13968) In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we would like to handle) support in the JVM by summing up two integers in one operation for instance. With explicit vectorization now available, it looks like we can get more benefits from the ability to compare multiple intetgers in one operations than from summing up two integers in one operation. Moving back to ints helps compare 2x more integers at once vs. longs.

jpountz added this to the 10.1.0 milestone Oct 31, 2024

jpountz changed the title ~~Move postings back to int[].~~ Move postings back to int[] to take advantage of having more lanes per vector. Oct 31, 2024

Build failures.

0631bae

jpountz mentioned this pull request Oct 31, 2024

Speed up advancing within a block, take 2. #13958

Merged

jpountz added 2 commits October 31, 2024 15:45

CHANGES

adbed2f

Remove Lucene912PostingsReader from the list of allowed callers of Ve…

51eac68

…ctorizationProvider.

gsmiller approved these changes Oct 31, 2024

View reviewed changes

jpountz added 2 commits October 31, 2024 17:46

Remove splitLongs.

c03d65e

Merge branch 'main' into move_postings_back_to_int_arrays

0405bd3

jpountz merged commit cfdd20f into apache:main Nov 1, 2024
3 checks passed

jpountz deleted the move_postings_back_to_int_arrays branch November 1, 2024 06:49

javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 6, 2024

Bump Elasticsearch codec to track Lucene101Codec

b05e644

See apache/lucene#13968

javanna mentioned this pull request Nov 6, 2024

Bump Elasticsearch codec to track Lucene101Codec elastic/elasticsearch#116318

Merged

javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 6, 2024

Bump Elasticsearch codec to track Lucene101Codec

fa068a6

See apache/lucene#13968

javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 6, 2024

Bump Elasticsearch codec to track Lucene101Codec

5ca891d

See apache/lucene#13968

javanna added a commit to elastic/elasticsearch that referenced this pull request Nov 8, 2024

Bump Elasticsearch codec to track Lucene101Codec (#116318)

ca91219

See apache/lucene#13968

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move postings back to int[] to take advantage of having more lanes per vector. #13968

Move postings back to int[] to take advantage of having more lanes per vector. #13968

jpountz commented Oct 31, 2024

jpountz commented Oct 31, 2024 •

edited

Loading

jpountz commented Oct 31, 2024

gsmiller left a comment

gsmiller Oct 31, 2024

jpountz Oct 31, 2024

jpountz Oct 31, 2024

jpountz commented Nov 4, 2024

Move postings back to int[] to take advantage of having more lanes per vector. #13968

Move postings back to int[] to take advantage of having more lanes per vector. #13968

Conversation

jpountz commented Oct 31, 2024

jpountz commented Oct 31, 2024 • edited Loading

jpountz commented Oct 31, 2024

gsmiller left a comment

Choose a reason for hiding this comment

gsmiller Oct 31, 2024

Choose a reason for hiding this comment

jpountz Oct 31, 2024

Choose a reason for hiding this comment

jpountz Oct 31, 2024

Choose a reason for hiding this comment

jpountz commented Nov 4, 2024

jpountz commented Oct 31, 2024 •

edited

Loading