Explore within-block skipping for postings #12486

mikemccand · 2023-08-02T19:52:08Z

Description

One of the differences between Tantivy and Lucene is that Tantivy supports within-block skipping (branchless binary search to locate a target docid inside a block of 128 postings), but Lucene skips only to block boundaries, and then does a linear scan within. Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow?

Anyway, we could explore within-block skipping as well -- it might make conjunctive queries where both terms have roughly the same cardinality, quite a bit faster?

jpountz · 2023-10-31T12:52:05Z

Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow?

I believe Tantivy does the same, except that it can take advantage of SIMD to accumulate docid deltas into the absolute docid (if it did not accumulate deltas up-front, it could not run a branchless binary search later on). I tried to look into whether we can do the same with Panama, but last time I checked it doesn't give ways to use _mm_slli_si128, which prevents us from making a faster prefix sum through vectorization: https://github.com/jpountz/vectorized-prefix-sum.

For reference, there is also this old @mkhludnev idea about encoding dense postings lists as bitsets, which would naturally help with skipping: #6116 (or can we do it on a per-block basis?). And more generally, there are some formats that are better at skipping within blocks like Elias-Fano.

jpountz · 2025-01-13T09:09:33Z

I thinkwe can consider this issue as closed via #13958, and possibly further improved via #14133.

mikemccand added the type:enhancement label Aug 2, 2023

mikemccand mentioned this issue Oct 31, 2023

Adding option to codec to disable patching in Lucene's PFOR encoding #12696

Closed

jpountz closed this as completed Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore within-block skipping for postings #12486

Explore within-block skipping for postings #12486

mikemccand commented Aug 2, 2023

jpountz commented Oct 31, 2023

jpountz commented Jan 13, 2025

Explore within-block skipping for postings #12486

Explore within-block skipping for postings #12486

Comments

mikemccand commented Aug 2, 2023

Description

jpountz commented Oct 31, 2023

jpountz commented Jan 13, 2025