Poor error recovery after missing > #24642

sharwell · 2018-02-05T13:37:16Z

Reported at Link
Version Used: 15.6 Preview 3

Steps to Reproduce:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Microsoft.Example
{
    class Class1
    {
        public static IEnumerable<(String Value, String Description) GetAllValues(Type t)
        {
            if (!t.IsEnum)
                throw new ArgumentException("no good");

            return Enum.GetValues(t).Cast<Enum>().Select(e => (e.ToString(), e.ToString()));
        }
    }
}

Expected Behavior:

A message saying the > is missing between Description) and GetAllValues.

Actual Behavior:

48 errors, and an editor that looks like this:

The text was updated successfully, but these errors were encountered:

Neme12 · 2018-02-05T13:48:26Z

Maybe it would help a little bit (it would definitely help in other cases as well) if the parser recognized a block directly inside a type declaration and parsed it as a BlockSyntax. Right now it skips the opening curly brace and eats the closing one.

Neme12 · 2018-02-05T15:23:59Z

I made a prototype of this:

to be clear: this does not address the underlying issue of the missing >. The method signature isn't parsed correctly. This just provides a way to gracefully handle the situation when we do fail to parse the signature and if the parser finds a block directly inside a class. But there is a concern here that the error is too small and easy to miss given that the whole block is invalid and we couldn't parse a method at all.

I'm not sure how the > could be addressed. If we insert a missing > after parsing a complete type, it would harm recovery in the case where you forgot to type a , instead - it would presume you meant a >. This seems like a hard problem to solve.

sharwell · 2018-02-05T15:51:13Z

I'm not sure how the > could be addressed. If we insert a missing > after parsing a complete type, it would harm recovery in the case where you forgot to type a , instead - it would presume you meant a >. This seems like a hard problem to solve.

An astute observation! While working on ANTLR 4, we found that a preference to close a nested construct (e.g. insert a missing >) typically provided better large-scale error handling than a preference to stay inside a looping construct (e.g. insert a missing ,).

Neme12 · 2018-02-05T16:56:41Z

@sharwell True, it does seem a lot less likely that you'd be missing a comma, except when you're writing in a new parameter. Maybe the case of a missing closing bracket/paren was considered less important because the IDE automatically inserts them as you're typing?

As a separate issue, it might make sense to investigate if the parser could make less distinction between scripting/regular syntax and push those errors to the semantic model (which already reports ERR_GlobalStatement), if it doesn't come at a performance cost. That also sounds like good general strategy for error recovery because if there's illegal syntax found in a certain context, it's likely it might be a legal construct in some place in the language. It's better to parse it and then tell them they're doing something wrong with 1 clear error message rather than having 50 errors for every single token.

CyrusNajmabadi · 2018-02-06T03:08:10Z

As a separate issue, it might make sense to investigate if the parser could make less distinction between scripting/regular syntax and push those errors to the semantic model (which already reports ERR_GlobalStatement), if it doesn't come at a performance cost.

That's the approach we've been taking for a while. We even codified it midway last year when i was doing PRs like: 001294b#diff-ac2b4121f27df06f497115cc68277654

In general, the strategy to take is to try to get the parser to only try to fit the code to the Syntax-Model. If you can't fit the code into the SyntaxModel, you definitely error. Once you can fit though, extra checks are performed during semantic analysis instead.

CyrusNajmabadi · 2018-02-06T03:10:10Z

I'm not sure how the > could be addressed. If we insert a missing > after parsing a complete type, it would harm recovery in the case where you forgot to type a , instead - it would presume you meant a >. This seems like a hard problem to solve.

It's not too tough in practice. Because our parser is hand-written recursive descent, we have the easy ability to write whatever code we want during error recovery. So we can examine the surrounding state to try to figure out which sort of issue it it and make the right choice based on what we see around it.

By default there is a baked in algorithm that is both correct, and pretty reasonable. But when it produces suboptimal results, it's fine to provide a specialized error resolution strategy.

Neme12 · 2018-02-12T17:46:07Z

In general, the strategy to take is to try to get the parser to only try to fit the code to the Syntax-Model. If you can't fit the code into the SyntaxModel, you definitely error. Once you can fit though, extra checks are performed during semantic analysis instead.

My suggestion was that the parser would parse out GlobalStatementSyntax, just like it does in the scripting dialect.

Because our parser is hand-written recursive descent, we have the easy ability to write whatever code we want during error recovery

Of course, I just wasn't sure what it should do in this case or how it can figure out which sort of issue this it. GetAllValues could be a type after all. Do you have in mind some sort of lookahead from the point after the tuple ) where it expected either > or, to look if there's a matching > for the last < and if there is, insert a comma just like today but if there isn't, insert a > instead?

AdamSpeight2008 · 2018-02-12T18:35:15Z

I think there is an option with the syntax.xml to insert if missing. If so that would good option.

CyrusNajmabadi · 2018-02-12T19:55:33Z

Of course, I just wasn't sure what it should do in this case or how it can figure out which sort of issue this it. GetAllValues could be a type after all. Do you have in mind some sort of lookahead from the point after the tuple ) where it expected either > or, to look if there's a matching > for the last < and if there is, insert a comma just like today but if there isn't, insert a > instead?

Honestly, you'd just have to write up a bunch of cases, and then see what you could do with the code. Ideally, whatever strategy you ended up implementing would be fairly lightweight and would not be too complex.

In other words: error recovery is an unending set of work in the parser. We tend to drive it based on real use cases where things go bad, combined with reasonable tweaks (i.e. looking at the local area, or doing small bits of speculative lookahead) that improve these cases without regressing anything else.

YairHalberstadt · 2019-01-03T08:06:53Z

Taking a quick look at a bunch of cases:

In a case where we expect either a > or a , token, then assuming we are missing a , when we are actually missing a > leads to terrible error recovery. Assuming we are missing a > when we are actually missing a , isn't brilliant, but tends to only cause localized errors.

As such the obvious first step is to assume we are missing a > not a , in such situations.

The same applies for > and . (i.e. List<A B might be List<A.B> or List<A> B)

If we lookahead and find a > without a matching <, maybe we could decide that actually a , was in order. I don't think there's any scenario where we would assume we were missing a . rather than a , as they lead to equivalent error recovery, are indistinguishable syntactically if the token is missing, and the latter is probably more common than the former.

CyrusNajmabadi · 2019-01-03T16:45:52Z

Taking a quick look at a bunch of cases:

In a case where we expect either a > or a , token, then assuming we are missing a , when we are actually missing a > leads to terrible error recovery. Assuming we are missing a > when we are actually missing a , isn't brilliant, but tends to only cause localized errors.

Could you expand on this? Why is this the case? Say the user really is missing a comma in a list, and you put in the >, won't that make the rest of the type-arg list just bogus stuff that will throw off the parser in other ways?

If we lookahead and find a > without a matching <,

Can you clarify what you mean by 'lookAhead' here?

YairHalberstadt · 2019-01-03T21:37:57Z

@CyrusNajmabadi

Could you expand on this? Why is this the case? Say the user really is missing a comma in a list, and you put in the >, won't that make the rest of the type-arg list just bogus stuff that will throw off the parser in other ways?

compare https://sharplab.io/#v2:D4AQDABCCMDcCwAoEBmKAmCBhCBvJEhUaAIgJYDGALmQPYB2AhgE4CeAPGfVQHxTRg+AWQAUASjwEi0qYQC+SOUA
with
https://sharplab.io/#v2:D4AQDABCCMDcCwAoEBmKAmCBhCBvJEhUaAMgJYDOALgDxkB2VEAsgBQCUeBRP3hAvkn5A===

They're both pretty terrible, but the Method declaration is at lease picked up in the first. Similarly for a missing ,`>` in argument lists. They're both pretty much as bad as each other in variable declarations.

So in a toss-up I would err on the side of adding a > token

YairHalberstadt · 2019-01-03T21:40:02Z

Can you clarify what you mean by 'lookAhead' here?

I think a sensible algorithm might be to say that a GenericTypeArgumentList will always appear somewhere before one of:

;
=
{
=>
: (in generic constraints, although we could rely on '{' )
where (in generic constraints, although we could rely on '{' )
new (a common keyword in expressions/statements although we could rely on ';' )

We look ahead till the next such token, and if we are able to find a closing > token (counting opening < along the way) we assume we are missing a comma.

Else we assume we are missing a > token.

If we get past some constant number of tokens, we give up and assume it's a > given that such a long type argument list is extremely unlikely

Does that sound sensible?

At the cost of making the function longer (but since it's a switch, presumably not much less performant) we could add all the miriad SytntaxKinds that can't appear in a Generic Type Argument List, but I think the above syntax kinds will be sufficient.

YairHalberstadt · 2019-01-03T21:47:29Z

Note that Type Parameter Lists are parsed entirely seperately in the parser, and so I would deal with them seperately to TypeArgumentLists

CyrusNajmabadi · 2019-01-04T01:13:34Z

Note that Type Parameter Lists are parsed entirely seperately in the parser, and so I would deal with them seperately to TypeArgumentLists

FWIW, this feels like something that would be valuable for many lists. I've oft been surprised that the C# parser has a dedicated loop for each type of list construct. For example, when i wrote the TypeScript parser, we just had a single function for parsing lists (and one for parsing delimited lists), that way any improvements we made would be adopted across all lists:

https://github.com/Microsoft/TypeScript/blob/b7d7d5f7b39a5b9619c77590e5fe7f434ed68f1e/src/compiler/parser.ts#L1687-L1708

https://github.com/Microsoft/TypeScript/blob/b7d7d5f7b39a5b9619c77590e5fe7f434ed68f1e/src/compiler/parser.ts#L2046-L2112

It seems a shame that if we did this work here it would only apply to this type of list instead of getting the benefit for all lists.

YairHalberstadt · 2019-01-04T09:31:44Z

@CyrusNajmabadi
Type parameter lists can only be one token per parameter, so fixing this for them is extremely simple - you just look ahead one token and see what you find.
I'd rather do that in a seperate PR to make things easier to review.

YairHalberstadt · 2019-01-04T09:34:50Z

Some fun statistics about the length of TypeArgumentLists in CoreFX by token length:

Token Length	Count	Cumulative Percentage
3.00	76276	0.774604
4.00	525	0.779935
5.00	10098	0.882483
6.00	3829	0.921368
7.00	1355	0.935128
8.00	1953	0.954961
9.00	1267	0.967828
10.00	775	0.975698
11.00	578	0.981568
12.00	248	0.984087
13.00	417	0.988321
14.00	283	0.991195
15.00	110	0.992312
16.00	220	0.994547
17.00	46	0.995014
18.00	55	0.995572
19.00	32	0.995897
20.00	30	0.996202
21.00	53	0.99674
22.00	26	0.997004
23.00	29	0.997299
24.00	14	0.997441
25.00	2	0.997461
26.00	116	0.998639
28.00	23	0.998873
30.00	10	0.998974
31.00	3	0.999005
32.00	11	0.999116
34.00	23	0.99935
35.00	2	0.99937
36.00	3	0.999401
38.00	2	0.999421
39.00	14	0.999563
41.00	7	0.999634
43.00	5	0.999685
45.00	5	0.999736
46.00	1	0.999746
47.00	5	0.999797
48.00	1	0.999807
49.00	5	0.999858
51.00	7	0.999929
53.00	6	0.99999
56.00	1	1

So that gives us an idea for how far we would need to lookahead in the worst case scenario

CyrusNajmabadi · 2019-01-04T15:37:05Z

Type parameter lists can only be one token per parameter, so fixing this for them is extremely simple - you just look ahead one token and see what you find.
I'd rather do that in a seperate PR to make things easier to review.

My point was about doing this fix for all separated lists, not just this list. :)

YairHalberstadt · 2019-01-05T19:32:43Z

@CyrusNajmabadi
I don't see how the fix could be generalised for all seperated lists. The best I could see would be creating some type of interface along the lines of

interface IMissingTokenDisambuguatorStateMachine
{
     void Reset();
     DisambiguationResult FeedNextToken(SyntaxKind kind);
}

enum DisambiguationResult
{
    MissingSeperator,
    MissingCloser,
    NeedMoreInfo
}

And then having a method:

bool IsMissingSeperator(IMissingTokenDisambuguatorStateMachine disambiguator)
{
    var lookAheadCount = 0;
    disambiguator.Reset();
    while(true)
    {
         var currentKind = this.PeekToken(lookAheadCount).Kind;
         if(currentKind == SyntaxKind.EndOfFile)
             return true; 
         switch(disambiguator.FeedNextToken(currentKind))
         {
             case DisambiguationResult.MissingSeperator:
                 return true;
             case DisambiguationResult.MissingCloser:
                 return false;
             case DisambiguationResult.NeedMoreInfo:
                 lookAheadCount++;
             default:
                 throw new InvalidOperationException();
         }
    }
}

I don't know if this generalisation is useful enough to be worth it though.

CyrusNajmabadi · 2019-01-05T20:40:59Z

I don't see how the fix could be generalised for all seperated lists.

I linked how we did it in TypeScript. It's very simple.

YairHalberstadt · 2019-01-05T21:45:27Z

Also note there's no general way to improve error recovery inside a method if the '>' token is missing, as we usually assume we are dealing with an expression not a type when the '>' is missing.

for example

List<Int a = new List<Int>();

// will be parsed as 
List < Int; a = new List<Int>();

So this fix will be about cases where we can unambiguously parse it as a type.

Rekkonnect · 2023-08-17T06:34:27Z

@CyrusNajmabadi did #69482 also indirectly fix this, or is it still a problem?

CyrusNajmabadi · 2023-08-17T13:34:42Z

That fix would not help this.

sharwell added Bug help wanted The issue is "up for grabs" - add a comment if you are interested in working on it Area-Compilers Developer Community The issue was originally reported on https://developercommunity.visualstudio.com labels Feb 5, 2018

sgordeyev mentioned this issue Feb 17, 2018

Fixed poor error recovery after missing > #24911

Closed

sharwell added 4 - In Review A fix for the issue is submitted for review. and removed help wanted The issue is "up for grabs" - add a comment if you are interested in working on it labels Feb 17, 2018

jinujoseph added this to the 15.8 milestone Feb 19, 2018

jcouv added help wanted The issue is "up for grabs" - add a comment if you are interested in working on it and removed 4 - In Review A fix for the issue is submitted for review. labels Jul 4, 2018

jcouv modified the milestones: 15.8, 16.0 Jul 4, 2018

YairHalberstadt mentioned this issue Jan 3, 2019

Improve error recovery after missing ">" token #32124

Closed

YairHalberstadt mentioned this issue Jan 10, 2019

Improve error recovery in type argument list #32351

Closed

jinujoseph modified the milestones: 16.0, 16.3 Jun 9, 2019

jcouv modified the milestones: 16.3, Compiler.Next Jul 8, 2019

Rekkonnect mentioned this issue Aug 27, 2023

Improve generic type argument list error recovery #69734

Merged

jaredpar modified the milestones: Compiler.Next, Backlog Sep 12, 2023

333fred closed this as completed in #69734 Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor error recovery after missing > #24642

Poor error recovery after missing > #24642

sharwell commented Feb 5, 2018 •

edited by jinujoseph

Loading

Neme12 commented Feb 5, 2018 •

edited

Loading

Neme12 commented Feb 5, 2018 •

edited

Loading

sharwell commented Feb 5, 2018

Neme12 commented Feb 5, 2018 •

edited

Loading

CyrusNajmabadi commented Feb 6, 2018

CyrusNajmabadi commented Feb 6, 2018

Neme12 commented Feb 12, 2018 •

edited

Loading

AdamSpeight2008 commented Feb 12, 2018

CyrusNajmabadi commented Feb 12, 2018

YairHalberstadt commented Jan 3, 2019 •

edited

Loading

CyrusNajmabadi commented Jan 3, 2019

YairHalberstadt commented Jan 3, 2019

YairHalberstadt commented Jan 3, 2019 •

edited

Loading

YairHalberstadt commented Jan 3, 2019

CyrusNajmabadi commented Jan 4, 2019

YairHalberstadt commented Jan 4, 2019

YairHalberstadt commented Jan 4, 2019

CyrusNajmabadi commented Jan 4, 2019

YairHalberstadt commented Jan 5, 2019

CyrusNajmabadi commented Jan 5, 2019

YairHalberstadt commented Jan 5, 2019 •

edited

Loading

Rekkonnect commented Aug 17, 2023

CyrusNajmabadi commented Aug 17, 2023

Poor error recovery after missing > #24642

Poor error recovery after missing > #24642

Comments

sharwell commented Feb 5, 2018 • edited by jinujoseph Loading

Neme12 commented Feb 5, 2018 • edited Loading

Neme12 commented Feb 5, 2018 • edited Loading

sharwell commented Feb 5, 2018

Neme12 commented Feb 5, 2018 • edited Loading

CyrusNajmabadi commented Feb 6, 2018

CyrusNajmabadi commented Feb 6, 2018

Neme12 commented Feb 12, 2018 • edited Loading

AdamSpeight2008 commented Feb 12, 2018

CyrusNajmabadi commented Feb 12, 2018

YairHalberstadt commented Jan 3, 2019 • edited Loading

CyrusNajmabadi commented Jan 3, 2019

YairHalberstadt commented Jan 3, 2019

YairHalberstadt commented Jan 3, 2019 • edited Loading

YairHalberstadt commented Jan 3, 2019

CyrusNajmabadi commented Jan 4, 2019

YairHalberstadt commented Jan 4, 2019

YairHalberstadt commented Jan 4, 2019

CyrusNajmabadi commented Jan 4, 2019

YairHalberstadt commented Jan 5, 2019

CyrusNajmabadi commented Jan 5, 2019

YairHalberstadt commented Jan 5, 2019 • edited Loading

Rekkonnect commented Aug 17, 2023

CyrusNajmabadi commented Aug 17, 2023

sharwell commented Feb 5, 2018 •

edited by jinujoseph

Loading

Neme12 commented Feb 5, 2018 •

edited

Loading

Neme12 commented Feb 5, 2018 •

edited

Loading

Neme12 commented Feb 5, 2018 •

edited

Loading

Neme12 commented Feb 12, 2018 •

edited

Loading

YairHalberstadt commented Jan 3, 2019 •

edited

Loading

YairHalberstadt commented Jan 3, 2019 •

edited

Loading

YairHalberstadt commented Jan 5, 2019 •

edited

Loading