Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize reserved prefixes #88140

Closed
nikomatsakis opened this issue Aug 18, 2021 · 10 comments
Closed

Stabilize reserved prefixes #88140

nikomatsakis opened this issue Aug 18, 2021 · 10 comments
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

Reserved prefixes stabilization report

Links

Summary

  • any_identifier#, any_identifier"...", and any_identifier'...' are now reserved
    syntax, and no longer tokenize.
  • This is mostly relevant to macros. E.g. quote!{ #a#b } is no longer accepted.
  • It doesn't treat keywords specially, so e.g. match"..." {} is no longer accepted.
  • Insert whitespace between the identifier and the subsequent #, ", or '
    to avoid errors.
  • Edition migrations will help you insert whitespace in such cases.

Details

To make space for new syntax in the future, we've decided to reserve syntax for prefixed identifiers and literals: prefix#identifier, prefix"string", prefix'c', and prefix#123, where prefix can be any identifier. (Except those prefixes that already have a meaning, such as b'...' (byte strings) and r"..." (raw strings).)

This provides syntax we can expand into in the future without requiring an edition boundary. We may use this for temporary syntax until the next edition, or for permanent syntax if appropriate.

Without an edition, this would be a breaking change, since macros can currently accept syntax such as hello"world", which they will see as two separate tokens: hello and "world". The (automatic) fix is simple though: just insert a space: hello "world". Likewise, prefix#ident should become prefix #ident. Edition migrations will help with this fix.

Other than turning these into a tokenization error, the RFC does not attach a meaning to any prefix yet. Assigning meaning to specific prefixes is left to future proposals, which will now—thanks to reserving these prefixes—not be breaking changes.

Some new prefixes you might potentially see in the future (though we haven't
committed to any of them yet):

  • k#keyword to allow writing keywords that don't exist yet in the current edition. For example, while async is not a keyword in edition 2015, this prefix would've allowed us to accept k#async in edition 2015 without having to wait for edition 2018 to reserve async as a keyword.
  • f"" as a short-hand for a format string. For example, f"hello {name}" as a short-hand for the equivalent format!() invocation.
  • s"" for String literals.
  • c"" or z"" for null-terminated C strings.

How unresolved questions were resolved and other interesting developments

Where and how to enforce prefixes

The biggest question was where to enforce the prefixes and emit errors. We ultimately opted to emit errors in the lexer, which meant that the lexer had to become aware of the current edition. There was an alternative of using "jointness" and enforcing the conditions in the parser. The idea was to leverage the fact that Rust tokens (at least some subset of them) record not only their content but whether they are separated by whitespace from the next token. This was intended to enable compound operators like << to be parsed as two < tokens in some parts øf the parser (types) and as a single token elsewhere (expressions), without the lexer having to know what state the parser was in. This same approach could conceptually be used so that the lexer doesn't have to know the edition.

As described in detail in this writeup, however, the jointness approach had several downsides. For example, it meant that lexing of literals was independent of prefix: we might like f"{foo("bar")}" to be lexed a a string, but that is not possible unless the lexer knows that an f string can contain embedded expressions. Similarly, which escape codes the lexer accepts depends on the prefix (e.g. \x for b""). (This is especially relevant for raw strings: whether fr"\" is accepted or not depends on what meaning we assign to fr.) Jointness also had forwards compatbility hazards with macro arm ordering. Finally, the lexer-based approach can be converted to a jointness-based approach later, as it currently gives errors much earlier in the process.

There were also advantages to jointness: it would allow more procedural macro prototyping, and it means that the lexer would remain independent of edition.

Edition used for procedural macro APIs

There are some procedural macro APIs that lex tokens from strings. Those APIs have not traditionally taken a span or other information from which an edition can be derived. Those APIs will be documented with the Edition that they use to do lexing. In the future we may wish to add new APIs that take a Span or other parameter and use that to derive the Edition.

@nikomatsakis nikomatsakis added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Aug 18, 2021
@nikomatsakis
Copy link
Contributor Author

@rfcbot fcp merge

I propose that we stabilize reserved prefixes. Note that they will not be exposed to stable completely until Rust 2021 is stabilized.

@rfcbot
Copy link

rfcbot commented Aug 18, 2021

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Aug 18, 2021
@nikomatsakis
Copy link
Contributor Author

@rfcbot concern no-reference-PR

We need to describe these changes in the Rust reference!

@m-ou-se
Copy link
Member

m-ou-se commented Aug 18, 2021

Edition used for procedural macro APIs

There are some procedural macro APIs that lex tokens from strings. Those APIs have not traditionally taken a span or other information from which an edition can be derived. Those APIs will be documented with the Edition that they use to do lexing. In the future we may wish to add new APIs that take a Span or other parameter and use that to derive the Edition.

proc_macro::TokenStream::from_str and proc_macro::Literal::from_str are affected, and now use the edition of the proc macro definition. (They use the call_span for all tokens they produce, which carries the source location of the invocation, but the edition of the proc macro definition.)

This affects quote::quote!{}, which will have to be updated to Rust 2021 before you can use prefixed things with it. That's not relevant right now since all the prefixes are reserved anyway. (It becomes relevant once we add z"" or something.)

@m-ou-se
Copy link
Member

m-ou-se commented Aug 23, 2021

This FCP needs to start today to make it in time for 1.56.

@m-ou-se
Copy link
Member

m-ou-se commented Aug 23, 2021

We need to describe these changes in the Rust reference!

@nikomatsakis I don't think that needs to block the start of the FCP, right? Can you resolve your concern? We can handle the reference while the FCP is in progress.

@nikomatsakis
Copy link
Contributor Author

@rfcbot resolve no-reference-PR

I'm going to mark this as resolved for now but try to get this done ASAP

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 23, 2021
@rfcbot
Copy link

rfcbot commented Aug 23, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Sep 2, 2021
@rfcbot
Copy link

rfcbot commented Sep 2, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Sep 2, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 18, 2021
@joshtriplett
Copy link
Member

AFAICT this is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants