Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash implementation for Path ignores path separators #127254

Closed
zanieb opened this issue Jul 3, 2024 · 4 comments · Fixed by #127297
Closed

Hash implementation for Path ignores path separators #127254

zanieb opened this issue Jul 3, 2024 · 4 comments · Fixed by #127297
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@zanieb
Copy link
Contributor

zanieb commented Jul 3, 2024

I tried this code:

use std::path::PathBuf;
use std::hash::{DefaultHasher, Hasher, Hash};

fn main() {
    let x = PathBuf::from("hello/world");
    let y = PathBuf::from("helloworld");
    
    let mut hasher = DefaultHasher::new();
    x.hash(&mut hasher);
    println!("{}", hasher.finish());

    let mut hasher = DefaultHasher::new();
    y.hash(&mut hasher);
    println!("{}", hasher.finish());
}

I expected to see this happen: Different hashes since these represent different paths.

Instead, this happened: The hashes are equal.

This makes sense for redundant path separators (e.g. //), but not for all path separators.


I added a test case to the standard library — which fails.

diff --git a/library/std/src/path/tests.rs b/library/std/src/path/tests.rs
index 92702b395df..ed945a764c5 100644
--- a/library/std/src/path/tests.rs
+++ b/library/std/src/path/tests.rs
@@ -1566,6 +1566,13 @@ macro_rules! tc (
     relative_from: Some("bar")
     );
 
+    tc!("foo/bar", "foobar",
+    eq: false,
+    starts_with: false,
+    ends_with: false,
+    relative_from: None
+    );
+
     tc!("foo/bar/baz", "foo/bar",
     eq: false,
     starts_with: true,
---- path::tests::test_compare stdout ----
thread 'path::tests::test_compare' panicked at library/std/src/path/tests.rs:1569:5:
"foo/bar" == "foobar", expected false, got 4111386826608627770 and 4111386826608627770

The relevant code is probably around

rust/library/std/src/path.rs

Lines 3112 to 3114 in 6292b2a

// skip over separator and optionally a following CurDir item
// since components() would normalize these away.
component_start = i + 1;
which last changed in 45082b0. If I revert that commit, the error persists. It looks like the relevant change is a083dd6, if I revert that commit the test passes. This was the solution I opted for downstream (in astral-sh/ruff#12159) before performing this investigation.

Meta

I tested this on stable (1.79.0), nightly (1.81.0), and on master (6292b2a).

Discovered via astral-sh/ruff#12158

@zanieb zanieb added the C-bug Category: This is a bug. label Jul 3, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 3, 2024
@the8472
Copy link
Member

the8472 commented Jul 3, 2024

Hash is not a cryptographic hash, so it does not promise a collision probability so low that practically all inputs map to distinct outputs. So if the correctness of your program relies on the absence of collisions then it's your program that's wrong, not Hash.

There may be some quality-of-implementation argument about avoiding collisions for data patterns that are likely to coexist. I'm not sure if it is the case here, I'll have to think about it. And even if it is then there also is the matter of making such a avoidance fast. The current code is optimized for speed.

@zanieb
Copy link
Contributor Author

zanieb commented Jul 3, 2024

I think that's a fair point, but the current implementation already goes out if its way to create equal hashes for paths like foo/./bar and foo/bar — why is foobar and foo/bar different? It seems like a very surprising collision for a user to encounter.

Thanks for taking a look!

@the8472
Copy link
Member

the8472 commented Jul 3, 2024

I think that's a fair point, but the current implementation already goes out if its way to create equal hashes for paths like foo/./bar and foo/bar — why is foobar and foo/bar different?

The difference is that Hash is a companion for Eq. If things compare equal then they must hash the same because that's what hashmaps rely on. foo/./bar and foo/bar do compare equally because Components normalizes the dot away. Therefore they must hash equally, which is why Hash must spend effort on that.

Things comparing inequal does not require the Hash to be different. Instead it's only a matter of quality-of-implementation whether they produce distinct hashes, not a matter of correctness.

@zanieb
Copy link
Contributor Author

zanieb commented Jul 3, 2024

That's a great point, thanks for clarifying. I agree this is not a "correctness" problem, but more of a practical consideration for the hashing algorithm.

@the8472 the8472 added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue. A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 3, 2024
@bors bors closed this as completed in 56557c4 Jul 7, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jul 7, 2024
Rollup merge of rust-lang#127297 - the8472:path-new-hash, r=Nilstrieb

Improve std::Path's Hash quality by avoiding prefix collisions

This adds a bit rotation to the already existing state so that the same sequence of characters chunked at different offsets into separate path components results in different hashes.

The tests are from rust-lang#127255

Closes rust-lang#127254
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
3 participants