Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error recovery after missing ">" token #32124

Conversation

YairHalberstadt
Copy link
Contributor

@YairHalberstadt YairHalberstadt commented Jan 3, 2019

Previously we assumed that when a > or , token was missing, the missing token was a ,.

This PR assumes that the missing token is >, unless a lookahead of one suggests otherwise.

A fix to #24642

@YairHalberstadt YairHalberstadt requested a review from a team as a code owner January 3, 2019 14:15
@@ -5455,6 +5467,12 @@ private void ParseTypeArgumentList(out SyntaxToken open, SeparatedSyntaxListBuil
}

close = this.EatToken(SyntaxKind.GreaterThanToken);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please place a 'return' here to help separate out the normal code flow, and hte helper extension methods.

else if (this.IsPossibleType())
{
var nextToken = this.PeekToken(1);
if (nextToken.Kind == SyntaxKind.GreaterThanToken || nextToken.Kind == SyntaxKind.CommaToken) // presumably missing a comma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is not really useful. It would be better to have a deeper explanation of what's going on here.

{
types.AddSeparator(this.EatToken(SyntaxKind.CommaToken));
types.Add(this.ParseTypeArgument());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see a lot of value with this helper. i think it would be easier to understand if it was just inlined.

@@ -6274,6 +6274,46 @@ class C
Assert.Equal((int)ErrorCode.ERR_IdentifierExpected, file.Errors()[0].Code);
}

[Fact]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs WorkItem attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the parameters to the WorkItem attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkItem(24642, "https://github.com/dotnet/roslyn/issues/24642")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

.Select(d => ((IFormattable)d).ToString(null, EnsureEnglishUICulture.PreferredOrNull)));

Assert.Equal(16, syntaxTree.FindNodeOrTokenByKind(SyntaxKind.MethodDeclaration).Position);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a lot more tests of this new behavior. especially where tehre might be a > but we now assume that the lack of a , causes us to terminate the list earlier.I would particularly like a PR that first adds those tests (and has the errors currently), then makes the parser change, and has to update those tests.

i.e. to see if this causes a regressive behavior in some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we should have tests that validate the actual shape of the tree. See several other parsing test files for examples of this. For example, nothing about this test helps me know if the parser really understood this was a method with a generic return type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi
That's fair enough and I will attempt to do so.
I think, but have not yet verified that error recovery here, although not perfect, is absolutely better than it was. i.e.there are no cases where error recovery is worse than it was previously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding a case where two commas in the type argument list are missed, but that is rare enough that I will not worry about it too much

@sharwell sharwell added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Jan 3, 2019
}
else if (this.IsPossibleType())
{
var nextToken = this.PeekToken(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand how peeking 1 token is sufficient. the 'possible type' may be unbounded in length. are you only testing simple 1-token types.

Copy link
Contributor Author

@YairHalberstadt YairHalberstadt Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi
You are right.
If lookahead is very quick per token, how far ahead is it performant to lookahead?

I think a sensible algorithm might be to say that a GenericTypeArgumentList will always appear somewhere before one of:

;
=
{
=>

We look ahead till the next such token, and if we are able to find a closing > token (counting opening < along the way) we assume we are missing a comma.

Else we assume we are missing a > token.

If we get past some contant number of tokens, we give up and assume it's a ,

Does that sound sensible?

At the cost of making the function longer (but since it's a switch, presumably not much less performant) we could add all the miriad SytntaxKinds that can't appear in a Generic Type Argument List, but I think the above syntax kinds will be sufficient.

@CyrusNajmabadi
Copy link
Member

You are right.
If lookahead is very quick per token, how far ahead is it performant to lookahead?

Performing more expensive computation in error cases is not really a problem (as long as it's not ridiculously expensive). That's because error cases are rare most of the time. i.e. in practice, files are 99+% correct 99+% of the time. Errors come in in a very localized fashion, but are then corrected by the user very quickly.

So, instead of just looking one token ahead, we should really be scanning a full type ahead in these error cases and looking to see what is after the next full type we see. We should also be testing with a wealth of interesting type-cases (including nested incomplete generics).

@YairHalberstadt
Copy link
Contributor Author

YairHalberstadt commented Jan 5, 2019

including nested incomplete generics
My current approach wouldn't work for

Dictionary<int List<int> M()
// this should be parsed as 
Dictionary<int, List<int>> M()
// but is instead parsed as 
Dictionary<int> List<int> M()

But i can't think of anyway to get that to work that isn't extremely complex.

@CyrusNajmabadi
Copy link
Member

Why can't you speculatively scan to to see if you have a type argument?

@YairHalberstadt
Copy link
Contributor Author

Why can't you speculatively scan to to see if you have a type argument?

Then you end up in trouble here:

void M(List<int list, string str)

You'd parse it as

void M(List<int, list, string> str)

You'd have to have different rules depending on where you were which would very quickly get very complex

@CyrusNajmabadi
Copy link
Member

You'd parse it as

What's wrong with that?

@YairHalberstadt
Copy link
Contributor Author

That requires assuming two tokens are missing rather than one, so is likely to be wrong more frequently

@YairHalberstadt
Copy link
Contributor Author

YairHalberstadt commented Jan 6, 2019

Why can't you speculatively scan to to see if you have a type argument?

It's also difficult to distinguish between

Dictionary<int int M()

// Which should be parsed as 

Dictionary<int int> M()

And

new Dictionary<int int M()

// Which should be parsed as

new Dictionary<int int M>()

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jan 6, 2019

I don't see what's difficult about distinguishing those. One is followed by an open paren, and thus looks like a name, and not a type-arg itself.

@CyrusNajmabadi
Copy link
Member

That requires assuming two tokens are missing rather than one, so is likely to be wrong more frequently

Why? I'm not understanding the relation here. It also seems like you may be overthinking this space. As mentioned previously, errors are rare. The only pop in occasionally as the user is in the middle of editing something. Heck, even wanting to change parsing here isn't something i'm strongly in favor of due to it not actually improving things substantively, while also potentially negatively impacting things.

@YairHalberstadt
Copy link
Contributor Author

YairHalberstadt commented Jan 6, 2019

I don't see what's difficult about distinguishing those. One is followed by an open paren, and thus looks like a name, and not a type-arg itself

In both cases they are followed by an open Paran, but in one case (a new expression) it is a type argument and in one (a return type) it isn't

@YairHalberstadt
Copy link
Contributor Author

Anyway, I'm going to close this for now, until I have something more concrete.

I feel like to make error recovery easier, the entire LanguageParser API needs to be reconsidered to make look-ahead easier. Currently all the various helper methods use the CurrentToken such as IsEndOfNamespace. Would it be acceptable to create overrides for all these methods which take a SyntaxToken as a parameter?

@YairHalberstadt YairHalberstadt deleted the ParserErrorRecoveryInGenerics branch January 6, 2019 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants