Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing ZipEightByteInteger #614

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

Glavo
Copy link
Contributor

@Glavo Glavo commented Nov 25, 2024

Use unsigned long instead of BigInteger to simplify code and improve performance.

@garydgregory
Copy link
Member

garydgregory commented Nov 26, 2024

Hello @Glavo

I believe this PR breaks the ZIP specification. The package may not be in full compliance either.

Please rebase on git master to pick up ZipEightByteIntegerTest.testBIFromMaxValue().

In https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT, I read

        4.4.1.1 All fields unless otherwise noted are unsigned and stored in Intel low-byte:high-byte, low-word:high-word order.

This means the max value for an unsigned 8 byte field is 18,446,744,073,709,551,615, which does not fit in a Java long.

WDYT?

@Glavo
Copy link
Contributor Author

Glavo commented Nov 26, 2024

Please rebase on git master to pick up ZipEightByteIntegerTest.testBIFromMaxValue().

Done. This PR can pass this test.

This means the max value for an unsigned 8 byte field is 18,446,744,073,709,551,615, which does not fit in a Java long.

We just need to use negative numbers to represent values ​​greater than Long.MAX_VALUE.

This is a standard practice, and Long also provides some convenient methods to operate these unsigned longs, such as toUnsignedString.

@Glavo
Copy link
Contributor Author

Glavo commented Nov 26, 2024

I added some more tests to prove that getLongValue and toString still behave the same way for very large numbers.

@Glavo
Copy link
Contributor Author

Glavo commented Nov 26, 2024

This means the max value for an unsigned 8 byte field is 18,446,744,073,709,551,615, which does not fit in a Java long.

Let me explain this in more detail.

Although byte, short, int, and long are treated as signed by default, they are essentially just 1, 2, 4, or 8 bytes, and the specific semantics depends on what the user thinks they are.

Since Java does not have unsigned integers, it is a common and standard practice to give unsigned semantics to byte, short, int, and long. A notable example is that JDK-4504839 and JDK-6322074 added methods to the standard library to help users more easily operate on these integer values ​​with unsigned semantics. In addition, methods such as ByteBuffer#getLong do essentially the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants