-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable hashing: add comments and tests concerning platform-independence #77319
Conversation
r? @davidtwco (rust_highfive has picked a reviewer for you, use r? to override) |
This is just a cleanup, by the way. Should be a non-functional change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm not familiar with this code.
r? @nagisa |
Overall LGTM, r=me, with or without additional tests added (although if they are not added, maybe file a task for somebody else?) |
c8b29b5
to
eb0a88f
Compare
@bors r+ |
📌 Commit eb0a88f has been approved by |
Thanks @nagisa. I realize now that there might not be any process like CI running unit tests against big-endian systems, which would mean that the added tests don't yet provide the value they promise. Would the best course of action be to add a BE system to CI? |
It is a long standing problem that may be eventually figured out in parallel. There are some third parties (such as Debian) that might eventually run these tests, so I think the tests as they are now are good enough. |
CI failed due to what seems to be a spurious network issue. Not sure if it needs to be kicked off again. |
eb0a88f
to
5b3c2fa
Compare
Saw that CI failed in #77350 on 32-bit systems due to some tests that needed updating ( |
I just stumbled upon this comment, which explains that the double swapping that my PR removes was actually intentional. So I think this PR shouldn't merge after all, at least as it is now. What I intend to do is to roll back some of the changes, but make some improvements to the comments so that it is clear why things have been done the way they've been done. @nnethercote and @michaelwoerister, please chime in if you have any input. Thanks for the patience with the back and forth here. :) |
The description says "Unit tests pass on both LE and BE systems" but subsequent comments suggest that statement might be wrong? I was very careful in #68914 to get everything correct for LE and BE and as far as I know the current code is correct. There was a ton of back-and-forth in that PR and Michael did various tests on BE machines. The code is confusing, and I added comments to make things clearer but there is probably room for further improvements there. (It is unfortunate that there is no BE testing on CI. The current state of the art is to request an account on the GCC compile farm and, once you've got that, test on one of their BE machines. Horrible, I know.) In summary: I would not be satisfied with the proposed changes in this PR unless you can show the current code does the wrong thing on BE machines, and I think you don't have that yet. But let me know if I've got that wrong. |
The issue wasn't LE vs BE, but 32 vs 64 bit. I had tested on x86-64 and 32-bit big-endian MIPS. However, the
Oh no, the current code does the right thing on BE. I was just having a difficult time understanding the structure of the code, and thought that this would be clearer. But now I see from the other thread why things were done this way. My intent is to roll back most of my changes and just add some additional comments. |
SipHasher128 implements short_write in an endian-independent way, yet its write_xxx Hasher trait methods undo this endian-independence by byte swapping the integer inputs on big-endian hardware. StableHasher then adds endian-independence back by also byte-swapping on big-endian hardware prior to invoking SipHasher128. This double swap may have the appearance of being a no-op, but is in fact by design. In particular, we really do want SipHasher128 to be platform-dependent, in order to be consistent with the libstd SipHasher. Try to clarify this intent. Also, add and update a couple of unit tests.
5b3c2fa
to
d061fee
Compare
I've made revisions and updated the PR title and description accordingly. I hope this is an improvement. :) |
@bors r+ |
📌 Commit d061fee has been approved by |
…r=nnethercote Stable hashing: add comments and tests concerning platform-independence SipHasher128 implements short_write in an endian-independent way, yet its write_xxx Hasher trait methods undo this endian-independence by byte swapping the integer inputs on big-endian hardware. StableHasher then adds endian-independence back by also byte-swapping on big-endian hardware prior to invoking SipHasher128. This double swap may have the appearance of being a no-op, but is in fact by design. In particular, we really do want SipHasher128 to be platform-dependent, in order to be consistent with the libstd SipHasher. Try to clarify this intent. Also, add and update a couple of unit tests. --- Previous commit text: ~SipHasher128: fix platform-independence confusion~ ~StableHasher is supposed to ensure platform independence by converting integers to little-endian and extending isize and usize to 64 bits as necessary, but in fact, much of that work is already handled by SipHasher128.~ ~In particular, SipHasher128 implements short_write in an endian-independent way, yet both StableHasher and SipHasher128 additionally attempt to achieve endian-independence by byte swapping on BE hardware before invoking short writes. This double swap has no effect, so let's remove it.~ ~Because short_write is endian-independent, SipHasher128 is already handling part of the platform-independence, and it would be somewhat difficult to make it *not* handle that part with the current implementation. As splitting platform-independence responsibilities between StableHasher and SipHasher128 would be confusing, let's make SipHasher128 handle all of it.~ ~Finally, update some incorrect comments and increase test coverage. Unit tests pass on both LE and BE systems.~
Rollup of 12 pull requests Successful merges: - rust-lang#76909 (Add Iterator::advance_by and DoubleEndedIterator::advance_back_by) - rust-lang#77153 (Fix recursive nonterminal expansion during pretty-print/reparse check) - rust-lang#77202 (Defer Apple SDKROOT detection to link time.) - rust-lang#77303 (const evaluatable: improve `TooGeneric` handling) - rust-lang#77305 (move candidate_from_obligation_no_cache) - rust-lang#77315 (Rename AllocErr to AllocError) - rust-lang#77319 (Stable hashing: add comments and tests concerning platform-independence) - rust-lang#77324 (Don't fire `const_item_mutation` lint on writes through a pointer) - rust-lang#77343 (Validate `rustc_args_required_const`) - rust-lang#77349 (Update cargo) - rust-lang#77360 (References to ZSTs may be at arbitrary aligned addresses) - rust-lang#77371 (Remove trailing space in error message) Failed merges: r? `@ghost`
SipHasher128 implements short_write in an endian-independent way, yet
its write_xxx Hasher trait methods undo this endian-independence by byte
swapping the integer inputs on big-endian hardware. StableHasher then
adds endian-independence back by also byte-swapping on big-endian
hardware prior to invoking SipHasher128.
This double swap may have the appearance of being a no-op, but is in
fact by design. In particular, we really do want SipHasher128 to be
platform-dependent, in order to be consistent with the libstd SipHasher.
Try to clarify this intent. Also, add and update a couple of unit tests.
Previous commit text:
SipHasher128: fix platform-independence confusionStableHasher is supposed to ensure platform independence by convertingintegers to little-endian and extending isize and usize to 64 bits as
necessary, but in fact, much of that work is already handled by
SipHasher128.
In particular, SipHasher128 implements short_write in anendian-independent way, yet both StableHasher and SipHasher128
additionally attempt to achieve endian-independence by byte swapping on
BE hardware before invoking short writes. This double swap has no
effect, so let's remove it.
Because short_write is endian-independent, SipHasher128 is alreadyhandling part of the platform-independence, and it would be somewhat
difficult to make it not handle that part with the current
implementation. As splitting platform-independence responsibilities
between StableHasher and SipHasher128 would be confusing, let's make
SipHasher128 handle all of it.
Finally, update some incorrect comments and increase test coverage.Unit tests pass on both LE and BE systems.