You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the differences between Tantivy and Lucene is that Tantivy supports within-block skipping (branchless binary search to locate a target docid inside a block of 128 postings), but Lucene skips only to block boundaries, and then does a linear scan within. Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow?
Anyway, we could explore within-block skipping as well -- it might make conjunctive queries where both terms have roughly the same cardinality, quite a bit faster?
The text was updated successfully, but these errors were encountered:
Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow?
I believe Tantivy does the same, except that it can take advantage of SIMD to accumulate docid deltas into the absolute docid (if it did not accumulate deltas up-front, it could not run a branchless binary search later on). I tried to look into whether we can do the same with Panama, but last time I checked it doesn't give ways to use _mm_slli_si128, which prevents us from making a faster prefix sum through vectorization: https://github.com/jpountz/vectorized-prefix-sum.
For reference, there is also this old @mkhludnev idea about encoding dense postings lists as bitsets, which would naturally help with skipping: #6116 (or can we do it on a per-block basis?). And more generally, there are some formats that are better at skipping within blocks like Elias-Fano.
Description
One of the differences between Tantivy and Lucene is that Tantivy supports within-block skipping (branchless binary search to locate a target docid inside a block of 128 postings), but Lucene skips only to block boundaries, and then does a linear scan within. Lucene is also doing the "accumulate docid deltas into the absolute docid" too in this loop, but I guess Tantivy does this separately somehow?
Anyway, we could explore within-block skipping as well -- it might make conjunctive queries where both terms have roughly the same cardinality, quite a bit faster?
The text was updated successfully, but these errors were encountered: