-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the roslyn tokenizer #10702
Use the roslyn tokenizer #10702
Conversation
… about the individual differences between most C# operators, as well the difference between C# numeric types, we'll just use a single kind in the new parser to keep things simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would honestly not try to diff this against the old code. Pretty much every line is touched. Just review as if it was a new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass and sifted through all the test diffs. Nothing smelled funny in the new tokenizer, but I'll give it a closer look tomorrow.
Whitespace;[ ]; | ||
Identifier;[foo]; | ||
Whitespace;[ ]; | ||
Assign;[=]; | ||
Whitespace;[ ]; | ||
StringLiteral;[@"blah LFblah; LF<p>Foo</p>LFblah LFblah];RZ1000(21:0,21 [1] ) | ||
StringLiteral;[@"blah LFblah; LF<p>Foo</p>LFblah LFblah];RZ1000(21:0,21 [2] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the difference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure. I'll look at this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, this is part of handling strings better. The span for the diagnostic is now @"
, not just the @
.
src/Compiler/Microsoft.AspNetCore.Razor.Language/test/Legacy/CSharpTokenizerLiteralTest.cs
Show resolved
Hide resolved
src/Compiler/test/Microsoft.NET.Sdk.Razor.SourceGenerators.Tests/RazorSourceGeneratorTests.cs
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/SyntaxKind.cs
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/CSharpCodeParser.cs
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Show resolved
Hide resolved
@@ -119,7 +118,7 @@ SyntaxToken ITokenizer.NextToken() | |||
return default(SyntaxToken); | |||
} | |||
|
|||
public void Reset() | |||
public virtual void Reset(int position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the position
parameter be used inside this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not needed in the base type. Only in the derived type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert that it's never passed here then? E.g., that it's equal to StartState
or making it int? position = null
and asserting that it's null
here.
Or if we never even call it for the base type, can it be abstract rather than virtual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do any of that. The reason that we need this for the roslyn-based tokenizer is that the roslyn-based tokenizer needs to be kept in sync as well. The other tokenizers don't have to do that work, and don't care what position is passed. They just go off what the SeekableTextReader says is the current position. I considered moving all of the Reset code into here, but that felt like more trouble that it was worth.
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/Tokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
@@ -525,6 +525,7 @@ public DesignTimeOptionsFeature(bool designTime) | |||
public void Configure(RazorParserOptionsBuilder options) | |||
{ | |||
options.SetDesignTime(_designTime); | |||
options.UseRoslynTokenizer = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also run razor-toolset-ci pipeline (which tests bunch of existing razor projects) with the roslyn tokenizer enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do this before we merge the feature branch and address any breaks that come up with dedicated tests at that point.
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
… to just be from the back of the results, as that's the most common order for the parser to reset in. I've also refactored a common advance loop to reduce duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few comments on where things might be clearer + the usual smattering of BOM fun.
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
case SyntaxKind.LeftParenthesis: | ||
return "("; | ||
case SyntaxKind.RightParenthesis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pulling the assert cases out into a separate local function to make the actual logic clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not sure what you mean here. How would that make the logic clearer?
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/NativeCSharpTokenizer.cs
Outdated
Show resolved
Hide resolved
...Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/InternalSyntax/SyntaxTokenCache.cs
Outdated
Show resolved
Hide resolved
src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/SyntaxKind.cs
Outdated
Show resolved
Hide resolved
src/Shared/Microsoft.AspNetCore.Razor.Test.Common/Language/Legacy/ParserTestBase.cs
Outdated
Show resolved
Hide resolved
...test/Microsoft.AspNetCore.Razor.Test.Common.Tooling/Language/Legacy/ToolingParserTestBase.cs
Outdated
Show resolved
Hide resolved
@chsienki @jjonescz @DustinCampbell for reviews please. |
|
||
using SyntaxToken = Microsoft.AspNetCore.Razor.Language.Syntax.InternalSyntax.SyntaxToken; | ||
using SyntaxFactory = Microsoft.AspNetCore.Razor.Language.Syntax.InternalSyntax.SyntaxFactory; | ||
using CSharpSyntaxKind = Microsoft.CodeAnalysis.CSharp.SyntaxKind; | ||
using CSharpSyntaxToken = Microsoft.CodeAnalysis.SyntaxToken; | ||
using CSharpSyntaxTriviaList = Microsoft.CodeAnalysis.SyntaxTriviaList; | ||
using Microsoft.AspNetCore.Razor.PooledObjects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this using to the first group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address in a follow up.
This is the big one: using the Roslyn tokenizer during Razor parsing. I've done my best to separate out various pieces into separate commits to make the review a bit simpler, but there's no getting around the lexer change being complicated. I would recommend commit-by-commit to make it as simple as possible. Fixes #10568, fixes #7084.