-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IntegerOverflow exception in postings encoding as group-varint #13376
Conversation
This change keeps the input values of |
The essence of this issue is how to deal with the integer value with the sign bit as 1 (like this integer overflow case). We have two options.
The first approach feels more reasonable. |
# Conflicts: # lucene/CHANGES.txt
Thanks for looking into it! Your approach works, but I'm tempted to fix it the other way around, by no longer checking if values are in the expected range with |
That's also a good idea! by this approach we can make |
I pushed the requested changes, @jpountz . No rush, just wanted to let you know. |
final int v = 1 << 30; | ||
final long[] values = new long[4]; | ||
values[0] = v; | ||
values[0] <<= 1; // values[0] = 2147483648 as long, but as int it is -2147483648 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not do values[0] = 1L << 31
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* @param values the values to write | ||
* @param values the values to write. Note: if original integer is negative, it should also be | ||
* negative as long, not positive which is greater than Integer.MAX_VALUE, that will cause | ||
* integer overflow exception in {@link Math#toIntExact(long)}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not mention this implementation detail.
* integer overflow exception in {@link Math#toIntExact(long)}. | |
* integer overflow exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has reverted, there is no change in DataOutput
in current fix approach.
private static int toInt(long value) { | ||
if (value < 0 || value > 0xFFFFFFFFL) { | ||
throw new ArithmeticException("integer overflow"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use Long.compareUnsigned
? (if (Long.compareUnsigned(value, 0xFFFFFFFFL) > 0)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
private static int toInt(long value) { | ||
if (value < 0 || value > 0xFFFFFFFFL) { | ||
throw new ArithmeticException("integer overflow"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
It's hard for me to tell what the expected user impact here is? Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which we now encode with GroupVInt, had a docID delta that was I guess because we don't see too many users reporting this, it is likely rare-ish. But is the GroupVInt change released in 9.x? Is this maybe enough to warrant a bugfix release if so? |
Yes, exactly. I guess the docID delta that was
GroupVInt was released at 9.9.0 https://lucene.apache.org/core/9_9_0/changes/Changes.html#v9.9.0.optimizations |
+1 to a bugfix release |
Can you backport to the 9.10 branch? |
Okay, I will backport to 9.10/branch_9x. |
…13376) The exception happen because the tail postings list block, which encoding with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs.
…13376) The exception happen because the tail postings list block, which encoding with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs.
Backport completed and added an entry under 9.10.1 Bug Fixes |
Closes: #13373
This exception occurs because a negative integer value stores as positive long. In line 376, after a long value
<< 1
, if the sign bit of the integer value is 1, it will be a negative number as integer, but a positive numbers as long, when we stores this value as positive long, it would causeMath.toIntExact
to throwArithmeticException
exception.lucene/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsWriter.java
Lines 373 to 379 in f12e489
POC code:
TODO: