Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hash for HashSet #21182

Closed
mdinger opened this issue Jan 15, 2015 · 7 comments
Closed

Implement Hash for HashSet #21182

mdinger opened this issue Jan 15, 2015 · 7 comments
Labels
A-collections Area: `std::collection`

Comments

@mdinger
Copy link
Contributor

mdinger commented Jan 15, 2015

This doesn't work but should be possible somehow. @chris-morgan @huonw @Aatch and others found a method to implement one: IRC log. Exact solution link here

use std::collections::HashSet;

#[derive(Hash, Eq, PartialEq)]
enum Enum {
    A,
    B(HashSet<Enum>),
}

fn main(){
    let mut set = HashSet::new();

    set.insert(Enum::A);
}
@chris-morgan
Copy link
Member

I would think that an implementation should be on HashMap and let HashSet derive it. Does this seem reasonable?

@kmcallister kmcallister added A-libs A-collections Area: `std::collection` labels Jan 16, 2015
@Gankra
Copy link
Contributor

Gankra commented Jan 21, 2015

It's impossible to implement Hash for HashSet, as far as I know. You would need some way to deterministically order all the elements in the HashSet for hashing, in a hasher-state and insertion-order independent way.

Since HashMap only requires Eq and Hash, and hashers have random state, there is no such ordering.

@Gankra Gankra closed this as completed Jan 21, 2015
@Gankra
Copy link
Contributor

Gankra commented Jan 21, 2015

I suppose you could have the impl be dependent on also happening to have Ord, and either take O(n^2) time and O(1) space, or O(nlogn) time and O(n) space, but both are pretty unsavoury options!

@Gankra
Copy link
Contributor

Gankra commented Jan 21, 2015

I suggest using EnumSet or TrieSet in collect-rs if you need some order-dependent information like this.

@Gankra
Copy link
Contributor

Gankra commented Apr 15, 2015

Note for nitpickers: You can synthesize a Hash using your favourite commutative function (+, *, XOR), but this would substantially undermine the security properties provided by SipHash. We obtain security through siphash by mixing values through the hasher, which is fundamentally order-dependent. This ensures that e.g. "tac" and "cat" don't hash to the same thing. However of course for a HashSet you would want [t, a, c] and [c, a, t] to hash to the same thing.

If you do this through a commutative function mix on top of the hashes of the individual elements, then I can theoretically produce a DDOS against a HashSet<HashSet<T>> by finding pairs of elements such that mix(hash(x), hash(y)) == 0. Given k such (disjoint) pairs, I can produce 2k HashSets of size at most 2k that all collide. This will produce O(n2) behaviour for operations that should take O(n) time.

Now of course this is a less fundamental flaw than this flaw existing for all keys, especially since a HashMap or HashSet as a key is a really niche thing. So perhaps this is an acceptable problem.

Finally because we use robin hood hashing, we're extra vulnerable to bad hashing because even "clumpy" distributions are a performance hazard.

@peterkelly
Copy link

While the elements don't necessarily have an ordering, their hashes do. This could be implemented by obtaining the hashes of all elements, sorting them, and then iterating through them to compute a hash for the set.

@the8472
Copy link
Member

the8472 commented Jan 28, 2022

#91837 implemented it for a rustc-internal hashet by using commutative operations.
But that approach can't be applied to the public HashSet because it requires intermediate hashers to be constructable. I.e. it would need something like impl<H, T> Hash<H> for HashSet<T> where H: Hasher + Default , but Hash has the generics on the methods rather than the trait itself, so that approach isn't applicable here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection`
Projects
None yet
Development

No branches or pull requests

6 participants