Use the roslyn tokenizer #10702

333fred · 2024-08-01T23:01:12Z

This is the big one: using the Roslyn tokenizer during Razor parsing. I've done my best to separate out various pieces into separate commits to make the review a bit simpler, but there's no getting around the lexer change being complicated. I would recommend commit-by-commit to make it as simple as possible. Fixes #10568, fixes #7084.

…tokenized.

… about the individual differences between most C# operators, as well the difference between C# numeric types, we'll just use a single kind in the new parser to keep things simpler.

…verage.

333fred · 2024-08-02T00:55:42Z

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

I would honestly not try to diff this against the old code. Pretty much every line is touched. Just review as if it was a new implementation.

DustinCampbell

I did a first pass and sifted through all the test diffs. Nothing smelled funny in the new tokenizer, but I'll give it a closer look tomorrow.

DustinCampbell · 2024-08-02T00:39:55Z

...age/test/TestFiles/ParserTests/CSharpErrorTest/TerminatesVerbatimStringAtEndOfFile.stree.txt

                Whitespace;[ ];
                Identifier;[foo];
                Whitespace;[ ];
                Assign;[=];
                Whitespace;[ ];
-                StringLiteral;[@"blah LFblah; LF<p>Foo</p>LFblah LFblah];RZ1000(21:0,21 [1] )
+                StringLiteral;[@"blah LFblah; LF<p>Foo</p>LFblah LFblah];RZ1000(21:0,21 [2] )


Why the difference here?

Hmm, not sure. I'll look at this tomorrow.

Ah, ok, this is part of handling strings better. The span for the diagnostic is now @", not just the @.

src/Compiler/Microsoft.AspNetCore.Razor.Language/test/Legacy/CSharpTokenizerLiteralTest.cs

src/Compiler/test/Microsoft.NET.Sdk.Razor.SourceGenerators.Tests/RazorSourceGeneratorTests.cs

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/SyntaxKind.cs

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/CSharpCodeParser.cs

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

jjonescz · 2024-08-02T09:29:26Z

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/Tokenizer.cs

@@ -119,7 +118,7 @@ SyntaxToken ITokenizer.NextToken()
        return default(SyntaxToken);
    }

-    public void Reset()
+    public virtual void Reset(int position)


Should the position parameter be used inside this method?

No, it's not needed in the base type. Only in the derived type.

Should we assert that it's never passed here then? E.g., that it's equal to StartState or making it int? position = null and asserting that it's null here.

Or if we never even call it for the base type, can it be abstract rather than virtual?

I don't think we can do any of that. The reason that we need this for the roslyn-based tokenizer is that the roslyn-based tokenizer needs to be kept in sync as well. The other tokenizers don't have to do that work, and don't care what position is passed. They just go off what the SeekableTextReader says is the current position. I considered moving all of the Reset code into here, but that felt like more trouble that it was worth.

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/Tokenizer.cs

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

jjonescz · 2024-08-02T09:36:32Z

...t.AspNetCore.Razor.Language/test/DefaultRazorIntermediateNodeLoweringPhaseIntegrationTest.cs

@@ -525,6 +525,7 @@ public DesignTimeOptionsFeature(bool designTime)
        public void Configure(RazorParserOptionsBuilder options)
        {
            options.SetDesignTime(_designTime);
+            options.UseRoslynTokenizer = true;


Can we also run razor-toolset-ci pipeline (which tests bunch of existing razor projects) with the roslyn tokenizer enabled?

Will do this before we merge the feature branch and address any breaks that come up with dedicated tests at that point.

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

… to just be from the back of the results, as that's the most common order for the parser to reset in. I've also refactored a common advance loop to reduce duplication.

chsienki

LGTM, a few comments on where things might be clearer + the usual smattering of BOM fun.

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

chsienki · 2024-08-08T18:11:27Z

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

                case SyntaxKind.LeftParenthesis:
-                    return "(";
                case SyntaxKind.RightParenthesis:


Consider pulling the assert cases out into a separate local function to make the actual logic clearer.

I'm honestly not sure what you mean here. How would that make the logic clearer?

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/NativeCSharpTokenizer.cs

...Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/InternalSyntax/SyntaxTokenCache.cs

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/SyntaxKind.cs

src/Shared/Microsoft.AspNetCore.Razor.Test.Common/Language/Legacy/ParserTestBase.cs

...test/Microsoft.AspNetCore.Razor.Test.Common.Tooling/Language/Legacy/ToolingParserTestBase.cs

333fred · 2024-08-09T20:30:06Z

@chsienki @jjonescz @DustinCampbell for reviews please.

jjonescz · 2024-08-12T09:00:10Z

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs


 using SyntaxToken = Microsoft.AspNetCore.Razor.Language.Syntax.InternalSyntax.SyntaxToken;
 using SyntaxFactory = Microsoft.AspNetCore.Razor.Language.Syntax.InternalSyntax.SyntaxFactory;
 using CSharpSyntaxKind = Microsoft.CodeAnalysis.CSharp.SyntaxKind;
 using CSharpSyntaxToken = Microsoft.CodeAnalysis.SyntaxToken;
 using CSharpSyntaxTriviaList = Microsoft.CodeAnalysis.SyntaxTriviaList;
+using Microsoft.AspNetCore.Razor.PooledObjects;


Consider moving this using to the first group.

Will address in a follow up.

333fred added 10 commits August 1, 2024 15:20

Remove unused ITokenizer interface, make tokenizers disposable.

894a9ba

Remove unneeded common states

a15d7cc

Allow setting the next tokenizer state after a razor comment body is …

3a8fd07

…tokenized.

Plumb the Roslyn tokenizer through more tests

60ac8d1

Nullable enable the tokenizer

73aa717

Add consolidated syntax kinds. Since there aren't consumers that care…

dcef8ed

… about the individual differences between most C# operators, as well the difference between C# numeric types, we'll just use a single kind in the new parser to keep things simpler.

Give better error messages when syntax tree verification fails.

5749a8e

The big change: use the roslyn tokenizer for tokenizing C#.

2823f64

Update SyntaxKinds to the new consolidated kinds in tests.

c25ac20

Increase test coverage of a few scenarios, and bulk up existing tests.

34ab111

333fred requested review from a team as code owners August 1, 2024 23:01

333fred added 2 commits August 1, 2024 16:16

Skip test for now, give a better exception.

abd6aba

Name change

acf7a60

davidwengier removed the request for review from a team August 1, 2024 23:48

Better support giving errors on raw string literals, increase test co…

640d377

…verage.

333fred commented Aug 2, 2024

View reviewed changes

DustinCampbell reviewed Aug 2, 2024

View reviewed changes

jjonescz reviewed Aug 2, 2024

View reviewed changes

jjonescz added the area-compiler Umbrella for all compiler issues label Aug 2, 2024

333fred added 2 commits August 2, 2024 14:36

PR feedback.

3155200

Convert to the correct type

1a0d57d

jjonescz reviewed Aug 5, 2024

View reviewed changes

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/RoslynCSharpTokenizer.cs Outdated Show resolved Hide resolved

Additional commenting and PR feedback. I've simplified the reset loop…

aa24f69

… to just be from the back of the results, as that's the most common order for the parser to reset in. I've also refactored a common advance loop to reduce duplication.

chsienki approved these changes Aug 8, 2024

View reviewed changes

More feedback

5b7e927

jjonescz approved these changes Aug 12, 2024

View reviewed changes

chsienki approved these changes Aug 12, 2024

View reviewed changes

333fred merged commit 91bbfde into dotnet:features/roslyn-tokenizer Aug 12, 2024
12 checks passed

333fred deleted the roslyn-tokenizer branch August 12, 2024 18:19

This was referenced Nov 4, 2024

[Automated] PRs inserted in VS build main-35503.21 #11145

Closed

[Automated] PRs inserted in VS build feature.debugger.main-35507.203 #11186

Closed

dotnet-bot mentioned this pull request Nov 19, 2024

[Automated] PRs inserted in VS build feature.debugger.shadowDebug-35518.109 #11227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the roslyn tokenizer #10702

Use the roslyn tokenizer #10702

333fred commented Aug 1, 2024 •

edited by jjonescz

Loading

333fred Aug 2, 2024

DustinCampbell left a comment

DustinCampbell Aug 2, 2024

333fred Aug 2, 2024

333fred Aug 2, 2024

jjonescz Aug 2, 2024

333fred Aug 2, 2024

jjonescz Aug 5, 2024 •

edited

Loading

333fred Aug 6, 2024

jjonescz Aug 2, 2024

333fred Aug 12, 2024

chsienki left a comment

chsienki Aug 8, 2024

333fred Aug 9, 2024

333fred commented Aug 9, 2024

jjonescz Aug 12, 2024

333fred Aug 12, 2024

Use the roslyn tokenizer #10702

Use the roslyn tokenizer #10702

Conversation

333fred commented Aug 1, 2024 • edited by jjonescz Loading

Choose a reason for hiding this comment

DustinCampbell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjonescz Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chsienki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

333fred commented Aug 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

333fred commented Aug 1, 2024 •

edited by jjonescz

Loading

jjonescz Aug 5, 2024 •

edited

Loading