Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vendor-specific suffixes to v0 mangling RFC 2603 #3224

Merged
merged 1 commit into from
Mar 22, 2022

Conversation

lqd
Copy link
Member

@lqd lqd commented Jan 20, 2022

This PR updates v0 mangling to add an Itanium ABI-like vendor-specific suffix to symbols.

This already happens in practice (e.g. with LLVM LTO to ensure a name is globally unique) and the demanglers following the spec strictly can't handle these symbols. This PR makes them officially part of the spec.

The vendor-specific suffix is optional and doesn't change semantics: it is IMO safe to ignore in the vast majority of cases.

Improvements to the wording and description would be welcome. Opening the PR for possible discussion at today's t-compiler meeting, and possible FCP later.

cc @michaelwoerister

@safinaskar

This comment has been minimized.

@lqd lqd changed the title Add vendor-specific suffixes to v0 mangling RFC 2063 Add vendor-specific suffixes to v0 mangling RFC 2603 Jan 23, 2022
@lqd
Copy link
Member Author

lqd commented Jan 23, 2022

Fixed

@eddyb
Copy link
Member

eddyb commented Jan 23, 2022

So I was talking to @lqd about my confusion around the Itanium ABI mangling not having such a general mechanism, and @lqd found that it was added in itanium-cxx-abi/cxx-abi#36, relatively recently!

Especially if someone doesn't find that repo, they're likely reading outdated versions without vendor suffixes.

The GNU demangler (in libiberty and used by e.g. c++filt) are not complying with it, they only handle the GCC .clone suffix

Oh, wow, I was wrong, c++filt will happily demangle a path (like _ZN3foo3barE into foo::bar), i.e. without a function signature, but they consider the vendor suffix part of the signature, so those don't get demangled without the signature (also, they kept their "clone" terminology even with other suffixes):

$ c++filt _ZN3foo3barE _ZN3foo3barEv _ZN3foo3barE.llvm.123 _ZN3foo3barEv.llvm.123
foo::bar
foo::bar()
_ZN3foo3barE.llvm.123
foo::bar() [clone .llvm.123]

I think it was a bad idea for this to be "nested" in a C++ symbol, because you then get weird stuff like that, and ideally it could be handled uniformly for every symbol name (after all, LLVM adds its suffixes without knowing about the mangling of the original symbol name, be it C++, Rust, unmangled, some other thing, etc.).

But specifying it in the RFC won't hurt, I suppose.

@ehuss ehuss added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Jan 27, 2022
@michaelwoerister
Copy link
Member

Alright, let's try to move this forward: @rfcbot fcp merge

@michaelwoerister
Copy link
Member

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 3, 2022

Team member @michaelwoerister has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Feb 3, 2022
Comment on lines 751 to 753
// There are no restrictions on the characters that may be used
// in the suffix following the `.`.
<vendor-specific-suffix> = "." <suffix>
Copy link
Member

@pierwill pierwill Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the suffix contain any characters at all? Or do these characters need to be in A-Z, a-z, 0-9, and _ as in RFC 2603?

As written this sounds like, for instance, arbitrary Unicode would be allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// There are no restrictions on the characters that may be used
// in the suffix following the `.`.
<vendor-specific-suffix> = "." <suffix>
// The suffix following `.` can contain any character in the allowed alphanumeric set.
<vendor-specific-suffix> = "." <suffix>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have no control over what external parties like LLVM stick into that suffix, we should indeed strive for allowing anything at all. LLVM for example will add suffixes that contain more . characters, like .llvm.3285396211802591752 or .cold.1.

@@ -746,6 +747,10 @@ Mangled names conform to the following grammar:
// We use <path> here, so that we don't have to add a special rule for
// compression. In practice, only a crate root is expected.
<instantiating-crate> = <path>

// There are no restrictions on the characters that may be used
// in the suffix following the `.`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also allow $ in addition to .? I think tools out there are supposed to expect this as well, because of Itanium:

Mangled names containing $ or . are reserved for private implementation use. Names produced using such extensions are inherently non-portable and should be given internal linkage where possible.

Here's an example of someone finding a Rust symbol with a $ suffix rather than a . suffix: https://www.reddit.com/r/rust/comments/t200fl/where_does_in_mangled_name_came_from/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, yes. Seems likely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that this should be added.

@Havvy
Copy link
Contributor

Havvy commented Feb 27, 2022

Is the v0 mangling scheme documented somewhere other than this RFC? Or, why is this an amendment of an already accepted RFC instead of a brand new one?

@bstrie
Copy link
Contributor

bstrie commented Feb 27, 2022

@Havvy The original v0 mangling RFC was this one: #2603 . I assume this is an amendment in order to reduce confusion for anyone looking for the canonical grammar who will most likely consult the page at https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html .

@Havvy
Copy link
Contributor

Havvy commented Feb 28, 2022

@bstrie Yes, and this goes against historical precedent of generally not updating RFCs with substantial content after they are approved. The mangling scheme, as it is is implemented, should be documented somewhere not in the RFCs repo and we could probably update the RFC to link to that. But I don't know where that is, if anywhere. And if it's not anywhere, then we should figure out where it should go and put it there.

@nikomatsakis
Copy link
Contributor

@rfcbot reviewed

@eddyb
Copy link
Member

eddyb commented Feb 28, 2022

Yes, and this goes against historical precedent of generally not updating RFCs with substantial content after they are approved.

In that case, I'm not sure we've been consistent across all RFCs. When I asked about this, I was directed to make amendments (there's been a few: https://github.com/rust-lang/rfcs/pulls?q=2603+OR+RFC2603 - though I regret not trying to make them more consistent name-wise).

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Feb 28, 2022
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 28, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Feb 28, 2022
@eddyb
Copy link
Member

eddyb commented Mar 2, 2022

@Havvy Oh, another thing to keep in mind is, #2705, #2879 and this PR all fix the wording to "reflect reality" when this mangling scheme was first implemented.

In the case of this PR, "vendor-specific suffixes" were being added by LLVM and stripped by rustc-demangle starting with the very first nightly you could use -Z symbol-mangling-version=v0.

The only amendment that wouldn't fit that description would be #3161 - about half of it could be applied to the RFC itself in order to clarify the original grammar in more detail (we were too handwavey initially), with the rest being a proper extension. That's where I'm not sure what to do about. I'll leave a comment on #3161.

@lqd
Copy link
Member Author

lqd commented Mar 2, 2022

Thanks to @bstrie for bringing up https://www.reddit.com/r/rust/comments/t200fl/where_does_in_mangled_name_came_from/ which contains another vendor-specific mangling use case: thread local data on mach-o uses $.

@michaelwoerister and I feel it's the same de facto situation as the period separator happening in the wild, and that we should add both vendor-specific separators. I've therefore updated the PR to mention this, as a separate commit for easier review and possible comments.

(I will squash everything before the PR gets merged, as there's also another cosmetic commit which correctly wraps at 80 chars like the rest of the text)

@michaelwoerister
Copy link
Member

Is the v0 mangling scheme documented somewhere other than this RFC? Or, why is this an amendment of an already accepted RFC instead of a brand new one?

The mangling scheme is not documented anywhere else, as far as I know. As @bstrie says, I think it's good to have a single source of truth for the definition that does not require people to chase down all the updates that might be found somewhere else. If that source of truth would move to some other place (like e.g. the Rust reference), then making additional RFCs instead of amending the original one would be fine too, in my opinion, but as long as we don't have that, I personally prefer to update the RFC in-place for minor things like this. My intuition would be different for larger, breaking changes.

@bstrie
Copy link
Contributor

bstrie commented Mar 2, 2022

Regarding "where is the definitive reference for this", it does seem like this is begging for a compiler-level reference to complement the existing language-level reference. I guess the rustc book ( https://doc.rust-lang.org/rustc/ ) might fill this role?

@michaelwoerister
Copy link
Member

The rustc book still seems to be aimed at end users. I'm not sure if that would be a good fit. But I don't have a strong opinion one way or other. It would be good to have some canonical place for having information like this (for example, documentation about the shape of our debuginfo could go there too).

@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label Mar 10, 2022
@rfcbot rfcbot added to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Mar 10, 2022
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 10, 2022

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@lqd lqd force-pushed the rfc-2063-vendor-suffixes branch from 9b2672d to fef292b Compare March 21, 2022 09:00
@lqd
Copy link
Member Author

lqd commented Mar 21, 2022

Note: we didn't squash and merge as soon as the FCP finished just to allow a bit more time for comments, especially about the similar case of $ which was noticed and incorporated in this PR a couple days into the FCP.

Now that this additional time has passed, I've squashed everything, and this should be ready to go.

@michaelwoerister michaelwoerister merged commit b1de058 into rust-lang:master Mar 22, 2022
@lqd lqd deleted the rfc-2063-vendor-suffixes branch March 22, 2022 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-compiler Relevant to the compiler team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants