WIP: Implement stable symbol-name generation algorithm. #31539

michaelwoerister · 2016-02-10T17:43:44Z

This PR changes the way symbol names are generated by the compiler. The new algorithm reflects the current state of the discussion over at rust-lang/rfcs#689.

Once it is done, it will also fix issue #30330. I want to add a test case for that before closing it though.

I also want to do some performance tests. The new algorithm does a little more work than the previous one due to various reasons, and it might make sense to adapt it in a way that allows it to be implemented more efficiently.

@nikomatsakis: It would be nice if there was a way of finding out if a DefPath refers to something in the current crate or in an external one. The information is already there, it's just not accessible at the moment. I'll probably propose some minor changes there, together with some facilities to allow for accessing DefPaths without allocating a Vec for them.

TODO

~~Actually "crate qualify" symbols, as promised in the docs.~~
~~Add a test case showing that symbol names are deterministic~~.
Maybe add a test case showing that symbol names are stable against small code changes.

One thing that might be interesting to the @rust-lang/compiler team:
I've used SipHash exclusively now for generating symbol hashes. Previously it was only used for monomorphizations and the rest of the code used a truncated version on SHA256. Is there any benefit to sticking to SHA? I don't really see one since we only used 64 bits of the digest anyway, but maybe I'm missing something? ==> Just switched things back to SHA-2 for now.

rust-highfive · 2016-02-10T17:43:59Z

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

bluss · 2016-02-10T18:49:19Z

SipHash it is an algorithm designed by cryptographers, but it's not in the same category of hash functions as SHA-2 which is a cryptographic hash function.

For example, the SipHash paper says

In general, building a MAC from a general-
purpose cryptographic hash function appears to be a highly suboptimal ap-
proach: general-purpose cryptographic hash functions perform many extra com-
putations for the goal of collision resistance on public inputs, while MACs have
secret keys and do not need collision resistance.

and

We comment that SipHash is not meant to be, and (obviously) is not, collision-
resistant.

This use case does not use a secret key with SipHash (it always uses the default key), and it wants to have collision resistance on public inputs. Yes, in practice, we can find collisions for either of them if the hash is only 64 bits long, but only SHA-2 is designed for this case.

michaelwoerister · 2016-02-10T20:19:14Z

@bluss Those are good references. It really sounds like SipHash is not a good candidate for this use case. I'll switch things to SHA-2.

bluss · 2016-02-11T13:16:12Z

What's the impact of a collision?

bors · 2016-02-11T14:55:57Z

☔ The latest upstream changes (presumably #31487) made this pull request unmergeable. Please resolve the merge conflicts.

michaelwoerister · 2016-02-11T16:33:38Z

What's the impact of a collision?

Symbol conflicts: Two functions with different signatures could end up with the same name. If it's in the same crate, compilation will probably just fail. If it's in different crates, either linking fails due to a conflict or the wrong symbol might be linked, resulting most probably in untraceable, subtle memory corruption.

nikomatsakis · 2016-02-11T19:17:55Z

Nice.

nikomatsakis · 2016-02-11T19:18:14Z

cc @alexcrichton @brson

alexcrichton · 2016-02-11T23:44:28Z

This looks fantastic to me, nice work @michaelwoerister!

I wouldn't have too many opinions about SHA2 vs SipHash, I suspect either choice will be suitable for our purposes. I'd personally love to jettison the SHA2 implementation in the compiler, and I highly doubt we can find a SipHash practical collision with the same path, different metadata, and yet something like different type substitutions. We should also be able to change this if we ever need to really as we can only guarantee no recompiles within one version of Rust anyway.

michaelwoerister · 2016-02-12T09:21:51Z

Thanks for the comment @alexcrichton. It's true, we have been using SipHash for function monomorphization for quite a while now and I've never seen a problem with that.

My assumption about the requirements for this hash function (and later also one used for fingerprinting incremental compilation artifacts) is that it should provide good distribution so that collision probability is as low as it can theoretically be for a given number of hash value bits, BUT that we don't care about "security", i.e. whether it's easy to find hash collisions on purpose. Or is there some kind of "attack scenario" that I'm overlooking?

For maximum performance (and if the above assumptions hold), we could also use something like CRC64.
Anyway, I think for this PR, I'll just leave SHA-2 in.

bluss · 2016-02-12T09:22:38Z

Here's the issue about collisions in type_id, but that's only tangential to this: #10389. As long as we go into this with the knowledge that SipHash is just a regular hash function, I'm fine.

michaelwoerister · 2016-02-12T09:27:41Z

@bluss Thanks for the link!

michaelwoerister · 2016-02-12T10:13:56Z

As an aside, I think "-C metadata" is a bit of a misnomer. Something like "-C salt" or "-C disambiguator" would be clearer (although the later one definitely is too long).

michaelwoerister · 2016-02-12T11:44:23Z

Another side note: Now that we get predictable symbol names for monomorphizations, we might want to start sharing them cross-crate.

alexcrichton · 2016-02-12T19:02:18Z

Ah as another data point in Cargo we extensively use SipHash for fingerprinting crates and packages. I haven't at least seen anyone run into any problems with that.

I also agree that at this point -C metadata is basically just wrong, but I don't think we can rename due to the extensive usage in Cargo right now :(

bors · 2016-02-13T03:22:25Z

☔ The latest upstream changes (presumably #31524) made this pull request unmergeable. Please resolve the merge conflicts.

arielb1 · 2016-02-13T12:35:45Z

What's the problem with the compiler SHA2 implementation? Making a collision with SipHash with a known key only requires some modifiable string on both ends - e.g. an item name, or comment.

michaelwoerister · 2016-02-13T17:16:06Z

@arielb1 I don't think anything is wrong with the SHA-2 implementation per se. It's just incomplete (only supports SHA256) and it's one more piece of code that needs to be maintained. I'd guess that second point is the main motivation for wanting to get rid of it.

michaelwoerister · 2016-02-13T18:05:27Z

So, here's something interesting: Some of the test cases (e.g. run-pass/typeid-intrinsic.rs) link to an external crate that has exactly the same name, and thus the crate graph ends up with two identical crates in it. So far, this was no problem because the two crates still have a different SVH but under the new scheme, two crates are considered the same if they have the same name/metadata pair.

The tests now fail with 'duplicate symbol' linker errors, but I guess the sensible thing to do here, is to emit a compilation error when the crate loader discovers two crates with the same name/metadata but different SVHs.

michaelwoerister · 2016-02-15T20:47:12Z

Up until that last commit this passes make check for me locally, with the last commit it looks good too, except that the run-pass/backtrace.rs test case fails (it seems to panic while trying to write the backtrace). Does anybody have an idea what that could be about? In general, creating backtraces seems to work just fine...

michaelwoerister · 2016-02-15T21:14:05Z

That's the error in question: https://travis-ci.org/rust-lang/rust/builds/109449298#L6591

alexcrichton · 2016-02-16T01:37:33Z

Oh you should just be able to update the test to check for backtrace::foo instead of only foo, I believe that previously the crate name was never on symbols, just in the debug description of symbols.

michaelwoerister · 2016-02-16T11:56:46Z

@alexcrichton I'm afraid it's something more complicated than that. If you look at the actual backtrace that the subprocess provides (see below), it looks like there is a panic while creating the backtrace (in std::sys::backtrace::tracing::imp::write in gcc_s.rs). Something causes an error there.

error: test run failed!
status: exit code: 101
command: x86_64-unknown-linux-gnu/test/run-pass/backtrace.stage2-x86_64-unknown-linux-gnu 
stdout:
------------------------------------------

------------------------------------------
stderr:
------------------------------------------
thread '<main>' panicked at 'bad output: thread '<main>' panicked at 'explicit panic', /home/travis/build/rust-lang/rust/src/test/run-pass/backtrace.rs:23
stack backtrace:
   1:     0x2b4313b5fec0 - std::sys::backtrace::tracing::imp::write::h5cb8831505517f1a
   2:     0x2b4313b68e3b - std::panicking::default_handler::_$u7b$$u7b$closure$u7d$$u7d$::h7d4838ac3001375c
   3:     0x2b4313b68a08 - std::panicking::default_handler::heb8f66506b25fc53
   4:     0x2b4313b31ac6 - std::sys_common::unwind::begin_unwind_inner::h896acc46de4d4a1c
   5:     0x2b4313671e44 - std::sys_common::unwind::begin_unwind::h72328ba1025302f0
                        at src/libstd/sys/common/unwind/mod.rs:236
   6:     0x2b4313671dd3 - backtrace::foo::h2bfe25e162df4c7c
                        at /home/travis/build/rust-lang/rust/<std macros>:3
   7:     0x2b43136732a5 - backtrace::main::h1e5dc0fe5e6f026b
                        at src/test/run-pass/backtrace.rs:98
   8:     0x2b4313b68474 - std::sys_common::unwind::try::try_fn::hbb3b06ed012dffa6
   9:     0x2b4313b5d82b - __rust_try
  10:     0x2b4313b680b1 - std::rt::lang_start::ha4a60ae3cf761e9f
  11:     0x2b431424eec4 - __libc_start_main
  12:     0x2b4313671c28 - <unknown>
  13:                0x0 - <unknown>
', /home/travis/build/rust-lang/rust/src/test/run-pass/backtrace.rs:54
note: Run with `RUST_BACKTRACE=1` for a backtrace.

------------------------------------------

alexcrichton · 2016-02-16T18:01:44Z

Hm, that looks normal to me though? It looks like the assertion message originates from here which was testing a panic originating here. The bad output happened because it expected to find - foo where it instead printed - backtrace::foo.

In other words we should probably really look into skipping the top few frames when printing a backtrace, but other than that it should be good!

michaelwoerister · 2016-02-17T05:12:57Z

Yes, you're right of course :)
The test passes just fine now.

…s with the same crate-name and crate-salt but different SVHs.

…runtime. We want to prevent compiling something against one version of a dynamic library and then, at runtime accidentally using a different version of the dynamic library. With the old symbol-naming scheme this could not happen because every symbol had the SVH in it and you'd get an error by the dynamic linker when using the wrong version of a dylib. With the new naming scheme this isn't the case any more, so this patch adds the "link-guard" to prevent this error case. This is implemented as follows: - In every crate that we compile, we emit a function called "__rustc_link_guard_<crate-name>_<crate-svh>" - The body of this function contains calls to the "__rustc_link_guard" functions of all dependencies. - An executable contains a call to it's own "__rustc_link_guard" function. As a consequence the "__rustc_link_guard" function call graph mirrors the crate graph and the dynamic linker will fail if a wrong dylib is loaded somewhere because its "__rustc_link_guard" function will contain a different SVH in its name.

michaelwoerister · 2016-03-15T11:07:33Z

@bors r=nikomatsakis (after rebase)

bors · 2016-03-15T11:07:33Z

📌 Commit f6b0f17 has been approved by nikomatsakis

bors · 2016-03-15T13:36:48Z

⌛ Testing commit f6b0f17 with merge 8c4d880...

@nikomatsakis

WIP: Implement stable symbol-name generation algorithm. This PR changes the way symbol names are generated by the compiler. The new algorithm reflects the current state of the discussion over at rust-lang/rfcs#689. Once it is done, it will also fix issue #30330. I want to add a test case for that before closing it though. I also want to do some performance tests. The new algorithm does a little more work than the previous one due to various reasons, and it might make sense to adapt it in a way that allows it to be implemented more efficiently. @nikomatsakis: It would be nice if there was a way of finding out if a `DefPath` refers to something in the current crate or in an external one. The information is already there, it's just not accessible at the moment. I'll probably propose some minor changes there, together with some facilities to allow for accessing `DefPaths` without allocating a `Vec` for them. **TODO** - ~~Actually "crate qualify" symbols, as promised in the docs.~~ - ~~Add a test case showing that symbol names are deterministic~~. - Maybe add a test case showing that symbol names are stable against small code changes. ~~One thing that might be interesting to the @rust-lang/compiler team: I've used SipHash exclusively now for generating symbol hashes. Previously it was only used for monomorphizations and the rest of the code used a truncated version on SHA256. Is there any benefit to sticking to SHA? I don't really see one since we only used 64 bits of the digest anyway, but maybe I'm missing something?~~ ==> Just switched things back to SHA-2 for now.

bors · 2016-03-15T13:51:29Z

💔 Test failed - auto-mac-64-nopt-t

nikomatsakis · 2016-03-15T19:32:19Z

failure looks legit http://buildbot.rust-lang.org/builders/auto-mac-64-nopt-t/builds/8420/steps/compile/logs/stdio

nikomatsakis · 2016-03-16T17:33:34Z

@michaelwoerister and I chatted a bit over e-mail. I'm going to close this PR and open one of my own that includes his commits plus a few more.

@michaelwoerister

Revamp symbol names for impls (and make them deterministic, etc) This builds on @michaelwoerister's epic PR #31539 (note that his PR never landed, so I just incorporated it into this one). The main change here is that we remove the "name" from `DefPathData` for impls, since that name is synthetic and not sufficiently predictable for incr comp. However, just doing that would cause bad symbol names since those are based on the `DefPath`. Therefore, I introduce a new mechanism for getting symbol names (and also paths for user display) called `item_path`. This is kind of simplistic for now (based on strings) but I expect to expand it later to support richer types, hopefully generating C++-mangled names that gdb etc can understand. Along the way I cleaned up how we track the path that leads to an extern crate. There is still some cleanup left undone here. Notably, I didn't remove the impl names altogether -- that would probably make sense. I also didn't try to remove the `item_symbols` vector. Mostly I want to unblock my other incr. comp. work. =) r? @eddyb cc @eddyb @alexcrichton @michaelwoerister

@michaelwoerister

…xcrichton Revamp symbol names for impls (and make them deterministic, etc) This builds on @michaelwoerister's epic PR #31539 (note that his PR never landed, so I just incorporated it into this one). The main change here is that we remove the "name" from `DefPathData` for impls, since that name is synthetic and not sufficiently predictable for incr comp. However, just doing that would cause bad symbol names since those are based on the `DefPath`. Therefore, I introduce a new mechanism for getting symbol names (and also paths for user display) called `item_path`. This is kind of simplistic for now (based on strings) but I expect to expand it later to support richer types, hopefully generating C++-mangled names that gdb etc can understand. Along the way I cleaned up how we track the path that leads to an extern crate. There is still some cleanup left undone here. Notably, I didn't remove the impl names altogether -- that would probably make sense. I also didn't try to remove the `item_symbols` vector. Mostly I want to unblock my other incr. comp. work. =) r? @eddyb cc @eddyb @alexcrichton @michaelwoerister

@michaelwoerister

…xcrichton Revamp symbol names for impls (and make them deterministic, etc) This builds on @michaelwoerister's epic PR #31539 (note that his PR never landed, so I just incorporated it into this one). The main change here is that we remove the "name" from `DefPathData` for impls, since that name is synthetic and not sufficiently predictable for incr comp. However, just doing that would cause bad symbol names since those are based on the `DefPath`. Therefore, I introduce a new mechanism for getting symbol names (and also paths for user display) called `item_path`. This is kind of simplistic for now (based on strings) but I expect to expand it later to support richer types, hopefully generating C++-mangled names that gdb etc can understand. Along the way I cleaned up how we track the path that leads to an extern crate. There is still some cleanup left undone here. Notably, I didn't remove the impl names altogether -- that would probably make sense. I also didn't try to remove the `item_symbols` vector. Mostly I want to unblock my other incr. comp. work. =) r? @eddyb cc @eddyb @alexcrichton @michaelwoerister

rust-highfive assigned Aatch Feb 10, 2016

michaelwoerister force-pushed the stable-symbols branch from f6a5588 to cc8d19d Compare February 11, 2016 16:11

michaelwoerister force-pushed the stable-symbols branch from fc2c4d8 to 7d7958b Compare February 12, 2016 18:17

michaelwoerister force-pushed the stable-symbols branch from 7d7958b to 0a2b84a Compare February 13, 2016 17:57

michaelwoerister force-pushed the stable-symbols branch from 0a2b84a to 13a5ea0 Compare February 15, 2016 20:43

michaelwoerister added 11 commits March 15, 2016 07:06

Make closures use stable symbol names.

eb4de06

Make drop glue use new symbol naming scheme.

4d1259e

Use new symbol naming scheme for fn-pointer-shims.

e3a78d8

Use new symbol naming scheme for fn-once-shims.

6351081

Use new symbol naming scheme for object shims.

9a1465c

Use new symbol names for items of various kinds.

a3dc048

Add a test to verify that we have reproducible compiler builds.

b327213

Salt test crates in buildsystem.

25cbff1

Make the compiler emit an error if the crate graph contains two crate…

9d5ce13

…s with the same crate-name and crate-salt but different SVHs.

Remove old symbol naming code.

00c206f

michaelwoerister force-pushed the stable-symbols branch from d534c75 to f6b0f17 Compare March 15, 2016 11:06

nikomatsakis closed this Mar 16, 2016

nikomatsakis mentioned this pull request Mar 16, 2016

Revamp symbol names for impls (and make them deterministic, etc) #32293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Implement stable symbol-name generation algorithm. #31539

WIP: Implement stable symbol-name generation algorithm. #31539

michaelwoerister commented Feb 10, 2016

rust-highfive commented Feb 10, 2016

bluss commented Feb 10, 2016

michaelwoerister commented Feb 10, 2016

bluss commented Feb 11, 2016

bors commented Feb 11, 2016

michaelwoerister commented Feb 11, 2016

nikomatsakis commented Feb 11, 2016

nikomatsakis commented Feb 11, 2016

alexcrichton commented Feb 11, 2016

michaelwoerister commented Feb 12, 2016

bluss commented Feb 12, 2016

michaelwoerister commented Feb 12, 2016

michaelwoerister commented Feb 12, 2016

michaelwoerister commented Feb 12, 2016

alexcrichton commented Feb 12, 2016

bors commented Feb 13, 2016

arielb1 commented Feb 13, 2016

michaelwoerister commented Feb 13, 2016

michaelwoerister commented Feb 13, 2016

michaelwoerister commented Feb 15, 2016

michaelwoerister commented Feb 15, 2016

alexcrichton commented Feb 16, 2016

michaelwoerister commented Feb 16, 2016

alexcrichton commented Feb 16, 2016

michaelwoerister commented Feb 17, 2016

michaelwoerister commented Mar 15, 2016

bors commented Mar 15, 2016

bors commented Mar 15, 2016

bors commented Mar 15, 2016

nikomatsakis commented Mar 15, 2016

nikomatsakis commented Mar 16, 2016

WIP: Implement stable symbol-name generation algorithm. #31539

WIP: Implement stable symbol-name generation algorithm. #31539

Conversation

michaelwoerister commented Feb 10, 2016

rust-highfive commented Feb 10, 2016

bluss commented Feb 10, 2016

michaelwoerister commented Feb 10, 2016

bluss commented Feb 11, 2016

bors commented Feb 11, 2016

michaelwoerister commented Feb 11, 2016

nikomatsakis commented Feb 11, 2016

nikomatsakis commented Feb 11, 2016

alexcrichton commented Feb 11, 2016

michaelwoerister commented Feb 12, 2016

bluss commented Feb 12, 2016

michaelwoerister commented Feb 12, 2016

michaelwoerister commented Feb 12, 2016

michaelwoerister commented Feb 12, 2016

alexcrichton commented Feb 12, 2016

bors commented Feb 13, 2016

arielb1 commented Feb 13, 2016

michaelwoerister commented Feb 13, 2016

michaelwoerister commented Feb 13, 2016

michaelwoerister commented Feb 15, 2016

michaelwoerister commented Feb 15, 2016

alexcrichton commented Feb 16, 2016

michaelwoerister commented Feb 16, 2016

alexcrichton commented Feb 16, 2016

michaelwoerister commented Feb 17, 2016

michaelwoerister commented Mar 15, 2016

bors commented Mar 15, 2016

bors commented Mar 15, 2016

bors commented Mar 15, 2016

nikomatsakis commented Mar 15, 2016

nikomatsakis commented Mar 16, 2016