Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast, byte-oriented Hash derive #2075

Closed
jswrenn opened this issue Nov 18, 2024 · 3 comments
Closed

Add fast, byte-oriented Hash derive #2075

jswrenn opened this issue Nov 18, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@jswrenn
Copy link
Collaborator

jswrenn commented Nov 18, 2024

For types that are IntoBytes, we could derive optimized implementations of Hash.

Adding derive(Hash) to a struct generates a recursive descent into the fields of the struct. Instead, for a type that is IntoBytes, we could relatively easily have derive(zerocopy::Hash) expand to something like this:

impl core::hash::Hash for UserTy
where
    UserTy: ::zerocopy::IntoBytes,
{
    fn hash<H: core::hash::Hasher>(&self, state: &mut H) {
        core::hash::Hasher::write(
            state,
            ::zerocopy::IntoBytes::as_bytes(self)
        )
    }

    fn hash_slice<H: core::hash::Hasher>(data: &[Self], state: &mut H)
    where
        Self: Sized,
    {
        core::hash::Hasher::write(
            state,
            ::zerocopy::IntoBytes::as_bytes(data)
        )
    }
}

Alternatively, if we run with the assumption that Hashers benefit from writing the greatest number of bytes at a time, we can leverage the known size and well-alignedness of the user type to emit calls to state.write_u128, state.write_u64, state.write_u32, and so on. A trailing DST suffix, if any, can be serialized with state.write(trailing.as_bytes()).

We should benchmark these approaches. I don't have a strong intuition for the performance characteristics of Rust's leading Hasher implementations. My guess is that we couldn't do better in the more complicated approach than a well optimized Hasher::write could do, so we're best off doing the simple thing.

@jswrenn jswrenn added the enhancement New feature or request label Nov 18, 2024
@jswrenn
Copy link
Collaborator Author

jswrenn commented Nov 19, 2024

We should benchmark these approaches. I don't have a strong intuition for the performance characteristics of Rust's leading Hasher implementations. My guess is that we couldn't do better in the more complicated approach than a well optimized Hasher::write could do, so we're best off doing the simple thing.

On further investigation, I don't think there's any benefit to the more complex approach. We should assume that Hasher::write is well-optimized, as it is for:

max-heller added a commit to max-heller/zerocopy that referenced this issue Dec 14, 2024
max-heller added a commit to max-heller/zerocopy that referenced this issue Dec 14, 2024
The standard library's derive for `Hash` generates a recursive descent
into the fields of the type it is applied to. This commit adds an
alternative derive that generates an optimized, byte-oriented `Hash`
implementation for types that implement `IntoBytes`. Instead of a
recursive descent, the generated implementation makes a single call
to `Hasher::write()` in both `Hash::hash()` and `Hash::hash_slice()`,
feeding the hasher the bytes of the type or slice all at once.

Resolves google#2075
max-heller added a commit to max-heller/zerocopy that referenced this issue Dec 23, 2024
The standard library's derive for `Hash` generates a recursive descent
into the fields of the type it is applied to. This commit adds a
`ByteHash` derive that generates an optimized, byte-oriented `Hash`
implementation for types that implement `IntoBytes`. Instead of a
recursive descent, the generated implementation makes a single call
to `Hasher::write()` in both `Hash::hash()` and `Hash::hash_slice()`,
feeding the hasher the bytes of the type or slice all at once.

Resolves google#2075
max-heller added a commit to max-heller/zerocopy that referenced this issue Dec 24, 2024
The standard library's derive for `Hash` generates a recursive descent
into the fields of the type it is applied to. This commit adds a
`ByteHash` derive that generates an optimized, byte-oriented `Hash`
implementation for types that implement `IntoBytes`. Instead of a
recursive descent, the generated implementation makes a single call
to `Hasher::write()` in both `Hash::hash()` and `Hash::hash_slice()`,
feeding the hasher the bytes of the type or slice all at once.

Resolves google#2075
github-merge-queue bot pushed a commit that referenced this issue Dec 24, 2024
The standard library's derive for `Hash` generates a recursive descent
into the fields of the type it is applied to. This commit adds a
`ByteHash` derive that generates an optimized, byte-oriented `Hash`
implementation for types that implement `IntoBytes`. Instead of a
recursive descent, the generated implementation makes a single call
to `Hasher::write()` in both `Hash::hash()` and `Hash::hash_slice()`,
feeding the hasher the bytes of the type or slice all at once.

Resolves #2075
@jswrenn
Copy link
Collaborator Author

jswrenn commented Jan 3, 2025

Partially completed in #2159; still needs a port to 0.9.

@joshlf
Copy link
Member

joshlf commented Jan 30, 2025

Given that #2159 is labeled needs-backport and given that our 0.9 release roadmap lists needs-backport as a blocker, I think we can close this issue. Reopen if you disagree.

@joshlf joshlf closed this as completed Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants