-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse error recovery is obversable by macros in several cases #103534
Comments
We probably should be a lot more conservative with recoveries in macros. This reminds me of #103224. |
I will put up a PR to add some more info the parser to make these checks easier in the future. |
@Nilstrieb and I discussed this a bit. Nils will add a flag to the parser state to identify when we want recovery to be enabled, and only do recovery when this is set to true. Then all parser recoveries that exist currently should eventually be gated behind this flag, so that errors are properly bubbled up within macros, so that we call fall ahead to subsequent matcher branches. For example, given this: macro_rules! blah {
($expr:expr) => {};
(not $expr:expr) => {};
}
fn main() {
blah!(not 1);
} We should not eagerly recover In the case that no arms match successfully, we can retry all of the match arms with recovery enabled, which allows something like this: macro_rules! blah {
($expr:expr) => {};
(let) => {};
}
fn main() {
blah!(not 1);
} ... to continue to give us a useful error message:
|
This brings up a fun question. Should this code compile? macro_rules! what {
($e:expr) => { compile_error!("no") };
($($tt:tt)*) => {};
}
fn wh() {
what! {
match 1;
}
} My knowledge about macros (and the two sentences the reference says about this) indicates that this should work. The first arm fails, but the second matches. But this doesn't compile, as the parser So we not only have to gate all recovery directly, but also all emissions of diagnostics behind the flag. There are currently 165 references to |
I believe it should be compile, and that my changes described above and discussed in your comment ("So we not only have to gate all recovery directly, but also all emissions of diagnostics behind the flag.") would be sufficient. Perhaps it's worthwhile to nominate this for lang team discussion, since that code you linked @Nilstrieb hasn't compiled since before 1.0 😅 |
Question for the lang team:
macro_rules! what {
($e:expr) => { compile_error!("no") };
($($tt:tt)*) => {};
}
fn wh() {
what! {
match 1;
}
}
macro_rules! blah {
($expr:expr) => {};
(not $expr:expr) => {};
}
fn main() {
blah!(not 1);
} -- This may even be more of a "implementation" question than a lang question, to perhaps this nomination is moot. |
I think the core problem is that My ideal solution (if we can find a way to do it without breaking the world) would be to make the second arm in macro_rules! check {
($e:expr) => {};
(not $a:literal) => {};
} be useless, as anything it could potentially match would go into the first arm instead, and then produce an error if it's not actually a valid Otherwise for anything we try to add to the After all, if people really want special handing for something, it would be better for them to put that first, like
so that if one day we did add (Maybe that implies that for each macro matcher we'd have some "start set", like |
The matching logic uses the starting set already (presumably as a fast path) |
We could definitely do that for decl macros 2.0 since they are unstable. For decl macros 1.2, we could start issuing future incompat warnings. I wonder if that'd make them less powerful. |
I did some more investigation on this. This thread contains quite some confusion about the actually implemented matching semantics (mostly because of myself) but I think I can now clean this up. The currently implemented macro matching semanticsOne arm is tried after another in declaration order. If matching or an arm fails with the When encountering a nonterminal specifier, a check is done whether the first token can start this nonterminal kind. If it cannot, the nonterminal is not entered (which can either lead to a The parser will then parse as many tokens as necessary into that nonterminal. Importantly, it may not consume all input if it deems the further grammar to be invalid. For example, If the parser fails to parse the nonterminal, a fatal error is emitted and rust/compiler/rustc_expand/src/mbe/macro_parser.rs Lines 613 to 621 in 0a6b941
If, after all matchers have been exhausted, there are still tokens around in the input, the matching is considered a With these semantics, this code should indeed not compile, as the expression grammar requires a macro_rules! what {
($e:expr) => { compile_error!("no") };
($($tt:tt)*) => {};
}
what! {
match 1;
} On the other hand, this code works just fine: macro_rules! check {
($e:expr) => {};
(NOT $a:literal) => {};
}
check! { NOT 1 } The expression parser is invoked with Where this goes wrongmacro_rules! check {
($e:expr) => {};
(not $a:literal) => {};
}
check! { not 1 } This should be equivalent to rust/compiler/rustc_parse/src/parser/expr.rs Lines 618 to 620 in 0a6b941
But this breaks the semantics described above. During nonterminal parsing, the parser must not consume more tokens than necessary, as its early return while leaving tokens behind is an important part of matching nonterminals. What nowBased on all of this, I have two conclusions. Firstly, I believe that the current semantics make sense and should continue working like this. I would still prefer this being discussed in the lang team anyways just to make sure. Also, it is clear that doing eager recovery by consuming more tokens than necessary should not happen in the nonterminal parser, ever. So the effort started by #103544 should continue, as this is clearly a bug. |
…ler-errors Add flag to forbid recovery in the parser To start the effort of fixing rust-lang#103534, this adds a new flag to the parser, which forbids the parser from doing recovery, which it shouldn't do in macros. This doesn't add any new checks for recoveries yet and is just here to bikeshed the names for the functions here before doing more. r? `@compiler-errors`
We discussed this in today's @rust-lang/lang meeting. We'd like both of these examples to work. Making the one with |
Only do parser recovery on retried macro matching Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics. Helps with rust-lang#103534
Only do parser recovery on retried macro matching Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics. Helps with rust-lang#103534
I wonder if we should take a different approach for fixing this issue instead of using What about changing This way, we can still show a custom error message without making the syntactically malformed input well-formed. Right now for example, Just throwing my thoughts out there. |
In some cases this may work, but not always. Looking at the But I agree that |
Only do parser recovery on retried macro matching Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics. Helps with rust-lang#103534
…iser Only suggest turbofish in patterns if we may recover Fixes [after backport] rust-lang#115780. CC rust-lang#103534.
Introduction
In multiple cases, macro fragment specifiers like
expr
andstmt
match more token streams than they should as a consequence of the parser trying to recover from obviously invalid Rust code to provide better immediate and subsequent error messages to the user.Why Is This a Concern?
The user should be allowed to assume that a fragment specifier only matches valid Rust code, anything else would make the fragment specifier not live up to its name and as a result render it useless (to exaggerate).
One use of macros is the ability to define embedded / internal domain-specific languages (DSLs). Part of that is defining new syntax which might not necessarily be valid Rust syntax. Declarative macros allow users to create several macro rules / matchers enabling relatively fine-grained matching on tokens. Obviously, when writing those rules, macro authors need to know what a given fragment specifier accepts in order to confidently determine which specific rule applies for a given input. If the grammar used by a fragment specifier is actually larger than promised and basically unknown (implementation-defined to be precise), this becomes an impossible task.
Not only that. If we don't do anything, the grammars matched by fragment specifiers will keep changing over time as more and more recovery code gets added. This breaks Rust's backward compatibility guarantees! Macro calls that used to compile at some fixed point in time might potentially no longer compile in a future version of the compiler. In fact, backward compatibility has already been broken multiple times in the past without notice by (some) PRs introducing more error recovery.
Examples
There might be many more cases than listed below but it takes a lot of time experimenting and looking through the parser. I'll try to extend the list over time.
Expressions
stderr
Statements
stderr
Other Fragments (e.g. Items, Types)
[no known cases at the time of this writing (active search ongoing)]
Editorial Notes
editorial notes
I used to list some more cases above (which some of you might have noticed) but I've since removed them as they've turned out to be incorrect. Here they are:
Related issue: #90256 (concerning procedural macros).
@rustbot label A-macros A-diagnostics A-parser T-compiler
The text was updated successfully, but these errors were encountered: