Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul MacArgs #96546

Merged
merged 6 commits into from
May 4, 2022
Merged

Overhaul MacArgs #96546

merged 6 commits into from
May 4, 2022

Conversation

nnethercote
Copy link
Contributor

Motivation:

  • Clarify some code that I found hard to understand.
  • Eliminate one use of three places where TokenKind::Interpolated values are created.

r? @petrochenkov

This commit rearranges the `match`. The new code avoids testing for
`MacArgs::Eq` twice, at the cost of repeating the `self.print_path()`
call. I think this is worthwhile because it puts the `match` in a more
standard and readable form.
Because `Nonterminal` is the type it visits.
The `token` is always an interpolated non-terminal expression, and
always a literal in valid code. This commit simplifies the processing
accordingly, by directly extracting and using the literal.
The two paths are equivalent -- they both end up calling `visit_expr()`.
I have kept the more restrictive path, the one that requires that
`token` be an expression nonterminal. (The next commit will simplify this
function further.)
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 29, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 29, 2022
@rust-log-analyzer

This comment has been minimized.

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 29, 2022

I don't think it's a right direction to fine-tune MacArgs::Eq to what is currently accepted.
IIRC, right now the implementation should mostly continue working as is if we start accepting #[key = 1 + 2] tomorrow.

Conceptually the value in #[key = VALUE] is a TokenStream first of all, just as in #[key(VALUE)]/#[key { VALUE }]/etc, it just happens to be packed into a single nonterminal token because it needs to be visited as an ast::Expr during expansion.
Once it's expanded, it's a #[key = ARBITRARY_TOKEN_STREAM] and can be encoded as such.
UPD: Before #80441 MacArgs::Eq literally contained a TokenStream and I'm still not sure that #80441 was a good change.

I'll look at the changes in detail a bit later.

@nnethercote
Copy link
Contributor Author

Conceptually the value in #[key = VALUE] is a TokenStream first of all, just as in #[key(VALUE)]/#[key { VALUE }]/etc, it just happens to be packed into a single nonterminal token because it needs to be visited as an ast::Expr during expansion.
Once it's expanded, it's a #[key = ARBITRARY_TOKEN_STREAM] and can be encoded as such.

The grammar says that VALUE is an expression. Presumably it can't become anything other than an expression later on?

As I mentioned above, my main motivation here is to get rid of one of the TokenKind:Interpolated occurrences. I tried a few different approaches, this was the one that ended up working. The fact that the Lo form contains a literal is consistent with how the current code works. I also tried keeping VALUE as an expression after lowering but there were some difficulties, e.g.:

  • Expr doesn't implement HashStable. This is why I have a hand-written HashStable impl for MacArgs.
  • There was a problem with decoding LazyTokenStream, I don't remember the exact details. But this may have been before I realized that parse_mac_args_common could change from using parse_expr_force_collect to parse_expr.

It might still be possible to get rid of the Hi/Lo distinction and always store VALUE as an expression, which would be nice. Basically, I'm open to all approaches that get rid of the Interpolated :)

@petrochenkov
Copy link
Contributor

The grammar says that VALUE is an expression.

A token stream that must fit into the expression grammar.

We actually have something pretty similar in the language, in

#[unexpanded_attribute_macro]
EXPR

EXPR is a token stream that must fit into the expression grammar.

In practice, while parsing such a stream we can build an ast::Expr immediately, and then carry it around with the token stream preserved inside it in expr.tokens.
Then to pass the tokens to the macro when the time comes, we obtain them using Annotatable::into_tokens, which wraps the expression into NtExpr and then uses nt_to_tokenstream on it (*).

So we can do the same thing with expressions in key-value attributes and represent them as MacArgs::Eq(Span, ast::Expr) with the token stream being preserved inside the ast::Expr.
Then fn inner_tokens can panic when encountering MacArgs::Eq and when it's really necessary the tokens would be obtained with nt_to_tokenstream.

If rustc_metadata can turn ast::Expr to tokens before encoding and parse it back to ast::Expr during decoding, then we don't even need the Lo/Hi variants.
Due to crate dependencies it may require keeping some function pointers to the to_tokens/from_tokens functions, like nt_to_tokenstream is currently kept in ItemLowerer.
Although we can just keep the Lo and Hi as this PR does, it doesn't look too bad (I'd call them Ast and Hir though).

(*) The use of Nonterminal there is dispensable, it can be avoided if the analogue of nt_to_tokenstream is implemented individually for all the relevant AST structures. I have an implementation for that in petrochenkov@8a3e257 and will submit it soon after some code cleanup. So MacArgs::Eq -> tokens conversion can employ nt_to_tokenstream for now.

@petrochenkov
Copy link
Contributor

So in this PR we generally need to get rid of the assumption that the expression is always a single literal.
I.e. collect tokens during the attribute parsing and use nt_to_tokenstream for obtaining tokens when they are necessary.

One exception is MacArgsEqKind::Hir, I thinks its fine to keep a Lit there for now, we'll be able to change it to TokenStream when more complex expressions actually start being allowed in these attributes.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 1, 2022
@nnethercote
Copy link
Contributor Author

The grammar says that VALUE is an expression.

A token stream that must fit into the expression grammar.

Isn't that trivially true for any AST fragment kind? E.g. an item is created from a token stream that must fit in the item grammar, a statement is created from a token stream that must fit into the statement grammar, etc. Or is there something special about expressions here that I'm overlooking?

@nnethercote
Copy link
Contributor Author

Thank you for the detailed explanation.

I've uploaded a new commit that addresses some of these comments.

  • I renamed Hi,Lo as Ast,Hir.
  • I didn't use nt_to_tokenstream in MacArgs::inner_tokens because (a) it's not currently necessary, and (b) it's a bit of a pain to do so, because rustc_ast can't easily use code from rustc_parse.
  • I didn't change parse_mac_args_common back to using parse_collect_force_collect because (a) it's not currently necessary, and (b) there's no need to run code and collect data that isn't being used.

However, to ease the transition to a possible future state that allows more than literals, I added a comment to MacArgs::inner_tokens explaining the latter two points and the changes that would be necessary.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 2, 2022
@petrochenkov
Copy link
Contributor

Isn't that trivially true for any AST fragment kind?

Yes, I meant any node under a macro attribute.

#[unexpanded_attribute_macro]
NODE

I used expression node as an example because key-value attributes also use expressions.

compiler/rustc_ast/src/ast.rs Outdated Show resolved Hide resolved
compiler/rustc_ast/src/attr/mod.rs Outdated Show resolved Hide resolved
compiler/rustc_ast/src/ast.rs Outdated Show resolved Hide resolved
compiler/rustc_ast/src/ast.rs Outdated Show resolved Hide resolved
compiler/rustc_ast/src/ast.rs Outdated Show resolved Hide resolved
compiler/rustc_ast/src/attr/mod.rs Outdated Show resolved Hide resolved
compiler/rustc_ast_pretty/src/pprust/state.rs Outdated Show resolved Hide resolved
compiler/rustc_ast_pretty/src/pprust/state.rs Show resolved Hide resolved
compiler/rustc_parse/src/validate_attr.rs Outdated Show resolved Hide resolved
compiler/rustc_parse/src/validate_attr.rs Outdated Show resolved Hide resolved
@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 2, 2022
@nnethercote
Copy link
Contributor Author

Thanks for the detailed review comments. I have addressed them all, except where I have explained above.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 3, 2022
@rust-log-analyzer

This comment has been minimized.

/// historical reasons. We'd like to get rid of it, for multiple reasons.
/// - It's conceptually very strange. Saying a token can contain an AST
/// node is like saying, in natural language, that a word can contain a
/// sentence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc_macro-style tokens contain "sentences" (delimited groups) in them all the time, but for flattened rustc_parse-style tokens it's unusual, I guess.
When parser librarification was still in plans, one of the alternatives was to switch rustc_parse to the proc_macro-style token model.

The main reason to get rid of Interpolated tokens from my point of view is that we want to define everything happening during macro expansion in terms of token streams, so we have to treat the AST pieces as token streams at least in some cases, moreover the token stream representation is the source of truth for them.
So these AST pieces are supposed to be a parser caching mechanism at most, which is highly non-obvious when they are represented like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're talking about token trees, rather than tokens, right?

The main reason to get rid of Interpolated tokens from my point of view ...

That's a good reason as well :)

compiler/rustc_ast/src/ast.rs Outdated Show resolved Hide resolved
@petrochenkov
Copy link
Contributor

r=me after squashing commits (all of this is basically changing representation of MacArgs::Eq and then dealing with the consequences).

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 4, 2022
The value in `MacArgs::Eq` is currently represented as a `Token`.
Because of `TokenKind::Interpolated`, `Token` can be either a token or
an arbitrary AST fragment. In practice, a `MacArgs::Eq` starts out as a
literal or macro call AST fragment, and then is later lowered to a
literal token. But this is very non-obvious. `Token` is a much more
general type than what is needed.

This commit restricts things, by introducing a new type `MacArgsEqKind`
that is either an AST expression (pre-lowering) or an AST literal
(post-lowering). The downside is that the code is a bit more verbose in
a few places. The benefit is that makes it much clearer what the
possibilities are (though also shorter in some other places). Also, it
removes one use of `TokenKind::Interpolated`, taking us a step closer to
removing that variant, which will let us make `Token` impl `Copy` and
remove many "handle Interpolated" code paths in the parser.

Things to note:
- Error messages have improved. Messages like this:
  ```
  unexpected token: `"bug" + "found"`
  ```
  now say "unexpected expression", which makes more sense. Although
  arbitrary expressions can exist within tokens thanks to
  `TokenKind::Interpolated`, that's not obvious to anyone who doesn't
  know compiler internals.
- In `parse_mac_args_common`, we no longer need to collect tokens for
  the value expression.
@nnethercote
Copy link
Contributor Author

I have squashed the relevant commits.

@bors r=petrochenkov

@bors
Copy link
Contributor

bors commented May 4, 2022

📌 Commit baa18c0 has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 4, 2022
@bors
Copy link
Contributor

bors commented May 4, 2022

⌛ Testing commit baa18c0 with merge 4c60a0e...

@bors
Copy link
Contributor

bors commented May 4, 2022

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing 4c60a0e to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 4, 2022
@bors bors merged commit 4c60a0e into rust-lang:master May 4, 2022
@rustbot rustbot added this to the 1.62.0 milestone May 4, 2022
@nnethercote nnethercote deleted the overhaul-MacArgs branch May 5, 2022 00:12
@bors bors mentioned this pull request May 5, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4c60a0e): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 1 1 16 9 17
mean2 0.3% 0.2% -0.8% -0.2% -0.8%
max 0.3% 0.2% -1.4% -0.3% -1.4%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

flip1995 pushed a commit to flip1995/rust that referenced this pull request May 5, 2022
…chenkov

Overhaul `MacArgs`

Motivation:
- Clarify some code that I found hard to understand.
- Eliminate one use of three places where `TokenKind::Interpolated` values are created.

r? `@petrochenkov`
@@ -1536,11 +1536,20 @@ pub enum MacArgs {
Eq(
/// Span of the `=` token.
Span,
/// "value" as a nonterminal token.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants