-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor error recovery after missing > #24642
Comments
Maybe it would help a little bit (it would definitely help in other cases as well) if the parser recognized a block directly inside a type declaration and parsed it as a BlockSyntax. Right now it skips the opening curly brace and eats the closing one. |
to be clear: this does not address the underlying issue of the missing I'm not sure how the |
An astute observation! While working on ANTLR 4, we found that a preference to close a nested construct (e.g. insert a missing |
@sharwell True, it does seem a lot less likely that you'd be missing a comma, except when you're writing in a new parameter. Maybe the case of a missing closing bracket/paren was considered less important because the IDE automatically inserts them as you're typing? As a separate issue, it might make sense to investigate if the parser could make less distinction between scripting/regular syntax and push those errors to the semantic model (which already reports ERR_GlobalStatement), if it doesn't come at a performance cost. That also sounds like good general strategy for error recovery because if there's illegal syntax found in a certain context, it's likely it might be a legal construct in some place in the language. It's better to parse it and then tell them they're doing something wrong with 1 clear error message rather than having 50 errors for every single token. |
That's the approach we've been taking for a while. We even codified it midway last year when i was doing PRs like: 001294b#diff-ac2b4121f27df06f497115cc68277654 In general, the strategy to take is to try to get the parser to only try to fit the code to the Syntax-Model. If you can't fit the code into the SyntaxModel, you definitely error. Once you can fit though, extra checks are performed during semantic analysis instead. |
It's not too tough in practice. Because our parser is hand-written recursive descent, we have the easy ability to write whatever code we want during error recovery. So we can examine the surrounding state to try to figure out which sort of issue it it and make the right choice based on what we see around it. By default there is a baked in algorithm that is both correct, and pretty reasonable. But when it produces suboptimal results, it's fine to provide a specialized error resolution strategy. |
My suggestion was that the parser would parse out GlobalStatementSyntax, just like it does in the scripting dialect.
Of course, I just wasn't sure what it should do in this case or how it can figure out which sort of issue this it. GetAllValues could be a type after all. Do you have in mind some sort of lookahead from the point after the tuple |
I think there is an option with the |
Honestly, you'd just have to write up a bunch of cases, and then see what you could do with the code. Ideally, whatever strategy you ended up implementing would be fairly lightweight and would not be too complex. In other words: error recovery is an unending set of work in the parser. We tend to drive it based on real use cases where things go bad, combined with reasonable tweaks (i.e. looking at the local area, or doing small bits of speculative lookahead) that improve these cases without regressing anything else. |
Taking a quick look at a bunch of cases: In a case where we expect either a As such the obvious first step is to assume we are missing a The same applies for If we lookahead and find a |
Could you expand on this? Why is this the case? Say the user really is missing a comma in a list, and you put in the
Can you clarify what you mean by 'lookAhead' here? |
compare https://sharplab.io/#v2:D4AQDABCCMDcCwAoEBmKAmCBhCBvJEhUaAIgJYDGALmQPYB2AhgE4CeAPGfVQHxTRg+AWQAUASjwEi0qYQC+SOUA They're both pretty terrible, but the Method declaration is at lease picked up in the first. Similarly for a missing So in a toss-up I would err on the side of adding a |
I think a sensible algorithm might be to say that a GenericTypeArgumentList will always appear somewhere before one of:
We look ahead till the next such token, and if we are able to find a closing > token (counting opening < along the way) we assume we are missing a comma. Else we assume we are missing a > token. If we get past some constant number of tokens, we give up and assume it's a Does that sound sensible? At the cost of making the function longer (but since it's a switch, presumably not much less performant) we could add all the miriad SytntaxKinds that can't appear in a Generic Type Argument List, but I think the above syntax kinds will be sufficient. |
Note that Type Parameter Lists are parsed entirely seperately in the parser, and so I would deal with them seperately to TypeArgumentLists |
FWIW, this feels like something that would be valuable for many lists. I've oft been surprised that the C# parser has a dedicated loop for each type of list construct. For example, when i wrote the TypeScript parser, we just had a single function for parsing lists (and one for parsing delimited lists), that way any improvements we made would be adopted across all lists: It seems a shame that if we did this work here it would only apply to this type of list instead of getting the benefit for all lists. |
@CyrusNajmabadi |
Some fun statistics about the length of TypeArgumentLists in CoreFX by token length:
So that gives us an idea for how far we would need to lookahead in the worst case scenario |
My point was about doing this fix for all separated lists, not just this list. :) |
@CyrusNajmabadi interface IMissingTokenDisambuguatorStateMachine
{
void Reset();
DisambiguationResult FeedNextToken(SyntaxKind kind);
}
enum DisambiguationResult
{
MissingSeperator,
MissingCloser,
NeedMoreInfo
} And then having a method: bool IsMissingSeperator(IMissingTokenDisambuguatorStateMachine disambiguator)
{
var lookAheadCount = 0;
disambiguator.Reset();
while(true)
{
var currentKind = this.PeekToken(lookAheadCount).Kind;
if(currentKind == SyntaxKind.EndOfFile)
return true;
switch(disambiguator.FeedNextToken(currentKind))
{
case DisambiguationResult.MissingSeperator:
return true;
case DisambiguationResult.MissingCloser:
return false;
case DisambiguationResult.NeedMoreInfo:
lookAheadCount++;
default:
throw new InvalidOperationException();
}
}
} I don't know if this generalisation is useful enough to be worth it though. |
I linked how we did it in TypeScript. It's very simple. |
Also note there's no general way to improve error recovery inside a method if the '>' token is missing, as we usually assume we are dealing with an expression not a type when the '>' is missing. for example List<Int a = new List<Int>();
// will be parsed as
List < Int; a = new List<Int>(); So this fix will be about cases where we can unambiguously parse it as a type. |
@CyrusNajmabadi did #69482 also indirectly fix this, or is it still a problem? |
That fix would not help this. |
Reported at Link
Version Used: 15.6 Preview 3
Steps to Reproduce:
Expected Behavior:
A message saying the
>
is missing betweenDescription)
andGetAllValues
.Actual Behavior:
48 errors, and an editor that looks like this:
The text was updated successfully, but these errors were encountered: