Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is a trailing expression in a block exactly? #61733

Open
petrochenkov opened this issue Jun 11, 2019 · 18 comments
Open

What is a trailing expression in a block exactly? #61733

petrochenkov opened this issue Jun 11, 2019 · 18 comments
Assignees
Labels
A-frontend Area: Compiler frontend (errors, parsing and HIR) A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@petrochenkov
Copy link
Contributor

Is it determined syntactically or semantically?
Before or after macro expansion?

Answering these questions is necessary to specify expansion of macros (stable fn-like ones or unstable attribute ones) in expression and statement positions.
The current implementation is sometimes inconsistent.

Below I'll be dumping some code examples expanded using different expansion models in hope to come up with some rules that are both self-consistent and backward compatible.

cc #33953

@petrochenkov petrochenkov added A-frontend Area: Compiler frontend (errors, parsing and HIR) A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 11, 2019
@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 11, 2019

Macros used in the examples:

macro empty() {}
macro expr() { 0 }
macro stmt() { 0 ; }
macro stmt_expr() { 0 ; 1 }

and equivalent attribute macros producing the same tokens.

Definitions:

"Token-based expansion" - macro invocation tokens are replaced by tokens produced by the macro, without knowing anything about AST-based context of that macro invocation (whether it's a macro in item position, or in expression position, etc).

Having an expansion model like this is necessary if we want to perform eager expansion.
Suppose that we somehow identified the macro invocation stmt!() in the next token stream:

0 . id 2 stmt!() ; bar +

To expand it we replace it with its produced tokens without having any idea about the context

0 . id 2 stmt!() ; bar +

=>

0 . id 2 0 ; ; bar +

"AST-based expansion" - macro invocation knows its AST node kind (whether it's a macro expression, or a macro item, or something else) and its produced tokens (+ perhaps some other neighbouring tokens) are immediately (re-)parsed as one or multiple AST nodes of that kind, then the new nodes replace the old node.
That's how current expansion in the compiler works, more or less.

stmt!(); // Statement kind

=> 

0 ; // Produced tokens, parsed as a statement `0;`

=>

0; // The parsed statement replaces the previous statement `stmt!();`

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 11, 2019

fn main() {
    { 2 }

    empty!();
}

Token-based expansion:

fn main() {
    { 2 } // <- trailing expression if only "output(empty!())  ;" is re-parsed after the macro expansion
           // (assuming empty statements are ignored when determining trailing-ness)
           // <- not a trailing expression if some larger context is re-parsed after the macro expansion

    ; // An "empty statement" if only "output(empty!())  ;" is re-parsed after the macro expansion
      // A part of "{ 2 };" statement if some larger context is re-parsed after the macro expansion
}

AST-based expansion:

fn main() {
    { 2 } // <- trailing expression

    // Stmt( empty!(); ) -> [/* empty stmt list */]
    /* Nothing */
}

Compiler's behavior:
{ 2 } is a trailing expression.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 11, 2019

fn main() {
    { 2 }

    expr!();
}

Token-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    0 ; // A non-empty statement, not a trailing expression
}

AST-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    // Stmt( expr!(); ) -> [Stmt(0;)]
    0; // A non-empty statement, not a trailing expression

    // ^^^ Assuming "output(expr!()) ;" is re-parsed and not only "output(expr!())".
    // In the second case ";" would be a separate empty statement, and "0" would be a trailing expression 
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is not a trailing expression.

@petrochenkov
Copy link
Contributor Author

fn main() {
    { 2 }

    stmt!();
}

Token-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    0 ; ; // A non-empty statement not a trailing expression, followed by empty statement
}

AST-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    // Stmt( stmt!(); ) -> [Stmt(0;)]
    0; // A non-empty statement, not a trailing expression
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is not a trailing expression.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 11, 2019

fn main() {
    { 2 }

    stmt_expr!();
}

Token-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    0 ; 1 ; // A non-empty statement not a trailing expression, followed by a non-empty statement not a trailing expression
}

AST-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    // Stmt( stmt_expr!(); ) -> [Stmt(0;), Stmt(1;)]
    0; 1; // A non-empty statement not a trailing expression, followed by a non-empty statement not a trailing expression
}

Compiler's behavior:
{ 2 } is not a trailing expression, 1 is not a trailing expression.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 11, 2019

fn main() {
    { 2 }

    empty!()
}

Token-based expansion:

fn main() {
    { 2 } // <- trailing expression

    /* Nothing */
}

AST-based expansion 1 (empty!() has statement kind):

fn main() {
    { 2 } // <- trailing expression

    // Stmt( empty!() ) -> [/* empty stmt list */]
    /* Nothing */
}

AST-based expansion 2 (empty!() has expression kind):

fn main() {
    { 2 } // N/A

    // Expr( empty!() ) -> ParseError (empty token stream is not an expression)
    /* Nothing */
}

Compiler's behavior:
Parse error error: macro expansion ends with an incomplete expression: expected expression.

@petrochenkov
Copy link
Contributor Author

fn main() {
    { 2 }

    expr!()
}

Token-based expansion:

fn main() {
    { 2 } // not a trailing expression

    0 // trailing expression
}

AST-based expansion 1 (expr!() has statement kind):

fn main() {
    { 2 } // not a trailing expression

    // Stmt( expr!() ) -> [Stmt(0)]
    0 // trailing expression
}

AST-based expansion 2 (expr!() has expression kind):

fn main() {
    { 2 } // not a trailing expression

    // Expr( expr!() ) -> Expr(0)
    0 // trailing expression
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is a trailing expression.

@jonas-schievink jonas-schievink added the A-parser Area: The parsing of Rust source code to an AST label Jun 13, 2019
@petrochenkov
Copy link
Contributor Author

fn main() {
    { 2 }

    stmt!()
}

Token-based expansion:

fn main() {
    { 2 } // not a trailing expression

    0 ; // not a trailing expression
}

AST-based expansion 1 (stmt!() has statement kind):

fn main() {
    { 2 } // not a trailing expression

    // Stmt( stmt!() ) -> [Stmt(0;)]
    0; // not a trailing expression
}

AST-based expansion 2 (stmt!() has expression kind):

fn main() {
    { 2 } // not a trailing expression

    // Expr( stmt!() ) -> ParseError (`0;` is not an expression)
    /* nothing */
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is a trailing expression.
This doesn't match any model and is clearly a bug (#33953).

@petrochenkov
Copy link
Contributor Author

fn main() {
    { 2 }

    stmt_expr!()
}

Token-based expansion:

fn main() {
    { 2 } // not a trailing expression

    0 ; 1 // trailing expression
}

AST-based expansion 1 (stmt_expr!() has statement kind):

fn main() {
    { 2 } // not a trailing expression

    // Stmt( stmt_expr!() ) -> [Stmt(0;), Stmt(1)]
    0; 1 // trailing expression
}

AST-based expansion 2 (stmt_expr!() has expression kind):

fn main() {
    { 2 } // not a trailing expression

    // Expr( stmt_expr!() ) -> ParseError (`0; 1` is not an expression)
    /* nothing */
}

Compiler's behavior:
Parse error: error: macro expansion ignores token 1 and any following

@petrochenkov
Copy link
Contributor Author

For attribute macros token-based and AST-based interpretations are the same as for fn-like macros, so I'm just going to list the actual compiler's behaviors.

cfg executes separate compiler's logic from other attribute macros, so it's tested here separately.

fn main() {
    { () } // trailing expression

    #[cfg(FALSE)]
    0; // not trailing expression
}

fn main() {
    { () } // trailing expression

    #[cfg(FALSE)]
    0 // not trailing expression
}

fn main() {
    { () } // not trailing expression

    #[cfg(TRUE)]
    0; // not trailing expression
}

fn main() {
    { () } // not trailing expression

    #[cfg(TRUE)]
    0 // trailing expression
}

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jul 8, 2019

so I'm just going to list the actual compiler's behaviors.

fn main() {
    { () } // trailing expression

    #[empty]
    0; // not trailing expression
}
fn main() {
    { () } // trailing expression

    #[empty]
    0 // not trailing expression
}
fn main() {
    { () } // not trailing expression

    #[expr]
    0; // trailing expression (the attribute transforms the whole statement, including semicolon)
        // so it's equivalent to `expr!( 0 ; )`
}
fn main() {
    { () } // not trailing expression

    #[expr]
    0 // trailing expression
}
fn main() -> u8 {
    { () } // not trailing expression

    #[stmt]
    0; // not trailing expression
}
fn main() -> () {
    { () } // not trailing expression

    #[stmt]
    0 // error: macro expansion ignores token `;` and any following
}
fn main() -> () {
    { () } // not trailing expression

    #[stmt_expr]
    0; // trailing expression (the attribute transforms the whole statement, including semicolon)
        // so it's equivalent to `stmt_expr!( 0 ; )`
}
fn main() -> () {
    { () } // not trailing expression

    #[stmt_expr]
    0 // error: macro expansion ignores token `;` and any following
}

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jul 9, 2019

Proposed resolution:

  • Use the token-based model, but without reparsing the whole world after expanding a single macro.
  • Introduce the notion of "reparse context" of a macro invocation including the invocation itself and some of its neighbouring tokens.
    Only tokens from the reparse context are reparsed after the macro invocation produces tokens.

Non-eager expansion

When doing a regular macro expansion we have a partially built AST with some nodes in it being unexpanded macros.
So we take the macro node, expand the macro, and replace the node with the expansion results somehow.

In the token-based model macro expansion produces a token stream that needs to be converted into AST somehow.
In theory, the token parsing depends on context and we could reparse the whole crate after a single macro is expanded.

macro foo() { 2 + 3 }
fn main() { 1 * foo!() }

=> expand and reparse the whole crate =>

macro foo() { 2 + 3 }
fn main() { 1 * 2 + 3 } // Oh, wait

For multiple reasons, starting with "operator priority hygiene" (as in the example above), and ending with performance we want to reparse as few tokens as possible.

Proposal 1: reparse context includes only tokens from the macro invocation's AST node.

// Example: macro invocation in a statement node

// <reparse_context>
#[inert_attrs] // optional
// <invocation>
foo!()
// </invocation>
; // optional
// </reparse_context>

So, we are going to reparse the tokens produced by the macro together with their "closest environment".
(I won't talk about the inert attributes here, they don't currently work according to this model, and we perhaps can make them to, but that's a separate question.)

The reparse context tokens are reparsed as multiple statements and the original statement node is replaced with them.

Proposal 2: treat the trailing semicolon-less macro invocation as a statement rather than an expression like it's sometimes treated now.

So its produced tokens could be parsed as multiple statements as well.

Proposal 3: introduce an empty statement ;.

We need it in the token model.
empty!(); statement in particular turns into a single empty statement after expansion.

Eager expansion

0 . id 2 foo!() ; bar +

Reparse context either consists of the invocation only, or there is a programmatic way to mark some neighbouring tokens as belonging to it.
I don't won't to speculate about the details, but the general idea is that the concept of reparse context can be expanded to eager expansion as well.

Examples: What changes and what continues working

See the next comments

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jul 9, 2019

What stops working:

fn main() {
    // `0` inside `stmt!()` is no longer a trailing expression
    // Fixes https://github.com/rust-lang/rust/issues/33953
    stmt!()
}

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jul 9, 2019

What starts working:

fn main() {
    // fn main() {}
    empty!()
}
fn main() {
    // fn main() { 0; 1 }, trailing
    stmt_expr!()
}
fn main() {
    // fn main() { 0; }, non-trailing
    #[stmt]
    0
}
fn main() {
    // fn main() { 0; 1 }, trailing
    #[stmt_expr]
    0
}

The reason is that semicolon-less macro statements can now expand into multiple statements.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jul 9, 2019

Interesting cases that work now and keep working:

fn foo() -> u8 {
    { 0 } // <- trailing expression

    empty!();
}

In the "reparse everything" model this would expand into fn foo() -> u8 { { 0 } ; } and { 0 } would stop being a trailing expression.
However, we do not reparse everything, only the current node (#61733 (comment))!
So we get two statements after the expansion - [Stmt({ 0 }), Stmt(;)].

We can introduce the rule that all empty statements are thrown away when determining the trailing expression to make { 0 } trailing.
We can introduce the same rule for item statements as well

fn foo() -> u8 {
    { 0 } // <- trailing?

    fn bar() {}
}

, but that's not strictly necessary for backward compatibility.

@Aaron1011
Copy link
Member

Aaron1011 commented Oct 25, 2020

We can introduce the rule that all empty statements are thrown away when determining the trailing expression to make { 0 } trailing.
We can introduce the same rule for item statements as well

@petrochenkov: Assuming that it doesn't cause much (or any breakage), I think it would be better to not ignore an empty statement when determining the trailing expression. That is:

fn foo() -> u8 {
    { 0 }

    empty!(); //~ ERROR: mismatched types
}

would fail to typecheck, since we have a trailing (statement) expression of ; with type ().

There are couple of reasons I think we should prefer this:

  1. Consistency with other trailing statements. All of these currently fail to compile:
fn trailing_item() -> bool {
    { true }
    fn inner() {}
}

fn trailing_stmt() -> bool {
    { true }
    let a = 1;
}

fn trailing_smei() -> bool {
    { true }
    ;
}

Of course, these are parsed somewhat differently (the newline in fn trailing_semi() doesn't matter, and we just have the single statement { true };). However, the visual appearance of the code matches up with the behavior - there's nothing between the trailing expression and the closing brace.

  1. Reducing the dependence on the implementation details of a proc macro

If the user writes:

fn foo() {
    { 0 }

    empty!(); //~ ERROR: mismatched types
}

Then someone reading the code will get the impression that this function returns a value of (). However, the value returned by the function actually depends on what empty! expands to - if it expands to nothing, the function return { 0 }: otherwise, the function returns (). If the function has a generic return type, both versions might actually compile. A macro might legitimately expand to nothing (e.g. based on features or platform #[cfg]s), so this could come up in real code.

Of course, doing this is a breaking change, since this currently compiles:

macro_rules! empty {
    () => { }
}

fn foo() -> bool {
    { true }
    empty!();
}

@petrochenkov: Assuming you don't object to the idea, I'll do a Crater run to get an idea of how much breakage this might cause.

With a bit of effort, I think I could come up with a future-incompatibility lint that would fire on any functions that would have their behavior changed by this (closures are a different story). During lowering, we would mark 'former werid trailing expressions' like { true }. If typecheck succeeds, we try to unify the return type with the type of the 'former weird trailing expression' (if it exists). If unification succeeds (e.g. a function returns an impl trait), we emit a warning.

@petrochenkov
Copy link
Contributor Author

@Aaron1011
I do not object, we can try making it an error if its use in practice is rare enough.

@Aaron1011
Copy link
Member

Aaron1011 commented Oct 27, 2020

One consequence of the proposed resolution in #61733 (comment):

As described in #61733 (comment), we will properly handle trailing semicolons in macro_rules! macros (fixing #33953). That is, the following code will compile:

fn main() {
    macro_rules! a {
        ($e:expr) => { $e; }
    }
    a!(true)
}

This code will continue to compile (note the semicolon after a!(true)). It will expand to true;;, which is a statement true; following by an empty statement ;

fn main() {
    macro_rules! a {
        ($e:expr) => { $e; }
    }
    a!(true);
}

However, what should happen to this code is less clear:

fn main() {
    macro_rules! a {
        ($e:expr) => { $e; }
    }
    let _val = a!(true);
}

This will expand to let _val = true;;, which compiles if written literally by the user. However, making this work would require reparsing the entire let statement (in the current implementation, we could just keep the existing AST node and append an empty ; statement). This is because a!(true) occurs in expression position, where semicolons are not allowed. That is, the code will be parsed as [let a = [true;]];, not [let a = true];;

I think rejecting this code is most consistent with the idea of a 'reparse context'. If we switch to token-based expansion (e.g. not constructing intermediate AST nodes), then allowing this would require reparsing arbitrarily many preceding tokens. For example, we could have let [StructPat { .. }, StructPat { field: OtherStruct { .. }}] = mac!(); - reparsing all of this would go against the idea of a local 'reparse context'.

However, this may be somewhat surprising to users, let _val = a!(true); and a!(true); look very similar under a purely token-based (i.e. reparse the world) approach: both expand to sequence with two trailing semicolons.

Aaron1011 added a commit to Aaron1011/rust that referenced this issue Nov 2, 2020
See rust-lang#61733 (comment)

We now preserve the trailing semicolon in a macro invocation, even if
the macro expands to nothing. As a result, the following code no longer
compiles:

```rust
macro_rules! empty {
    () => { }
}

fn foo() -> bool { //~ ERROR mismatched
    { true } //~ ERROR mismatched
    empty!();
}
```

Previously, `{ true }` would be considered the trailing expression, even
though there's a semicolon in `empty!();`

This makes macro expansion more token-based.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Nov 3, 2020
…expr, r=petrochenkov

Treat trailing semicolon as a statement in macro call

See rust-lang#61733 (comment)

We now preserve the trailing semicolon in a macro invocation, even if
the macro expands to nothing. As a result, the following code no longer
compiles:

```rust
macro_rules! empty {
    () => { }
}

fn foo() -> bool { //~ ERROR mismatched
    { true } //~ ERROR mismatched
    empty!();
}
```

Previously, `{ true }` would be considered the trailing expression, even
though there's a semicolon in `empty!();`

This makes macro expansion more token-based.
JohnTitor added a commit to JohnTitor/rust that referenced this issue Nov 3, 2020
…expr, r=petrochenkov

Treat trailing semicolon as a statement in macro call

See rust-lang#61733 (comment)

We now preserve the trailing semicolon in a macro invocation, even if
the macro expands to nothing. As a result, the following code no longer
compiles:

```rust
macro_rules! empty {
    () => { }
}

fn foo() -> bool { //~ ERROR mismatched
    { true } //~ ERROR mismatched
    empty!();
}
```

Previously, `{ true }` would be considered the trailing expression, even
though there's a semicolon in `empty!();`

This makes macro expansion more token-based.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-frontend Area: Compiler frontend (errors, parsing and HIR) A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants