Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the roslyn tokenizer #10702

Merged
merged 17 commits into from
Aug 12, 2024

Conversation

333fred
Copy link
Member

@333fred 333fred commented Aug 1, 2024

This is the big one: using the Roslyn tokenizer during Razor parsing. I've done my best to separate out various pieces into separate commits to make the review a bit simpler, but there's no getting around the lexer change being complicated. I would recommend commit-by-commit to make it as simple as possible. Fixes #10568, fixes #7084.

@333fred 333fred requested review from a team as code owners August 1, 2024 23:01
@davidwengier davidwengier removed the request for review from a team August 1, 2024 23:48
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would honestly not try to diff this against the old code. Pretty much every line is touched. Just review as if it was a new implementation.

Copy link
Member

@DustinCampbell DustinCampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and sifted through all the test diffs. Nothing smelled funny in the new tokenizer, but I'll give it a closer look tomorrow.

Whitespace;[ ];
Identifier;[foo];
Whitespace;[ ];
Assign;[=];
Whitespace;[ ];
StringLiteral;[@"blah LFblah; LF<p>Foo</p>LFblah LFblah];RZ1000(21:0,21 [1] )
StringLiteral;[@"blah LFblah; LF<p>Foo</p>LFblah LFblah];RZ1000(21:0,21 [2] )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the difference here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. I'll look at this tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, this is part of handling strings better. The span for the diagnostic is now @", not just the @.

@@ -119,7 +118,7 @@ SyntaxToken ITokenizer.NextToken()
return default(SyntaxToken);
}

public void Reset()
public virtual void Reset(int position)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the position parameter be used inside this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not needed in the base type. Only in the derived type.

Copy link
Member

@jjonescz jjonescz Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that it's never passed here then? E.g., that it's equal to StartState or making it int? position = null and asserting that it's null here.

Or if we never even call it for the base type, can it be abstract rather than virtual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do any of that. The reason that we need this for the roslyn-based tokenizer is that the roslyn-based tokenizer needs to be kept in sync as well. The other tokenizers don't have to do that work, and don't care what position is passed. They just go off what the SeekableTextReader says is the current position. I considered moving all of the Reset code into here, but that felt like more trouble that it was worth.

@@ -525,6 +525,7 @@ public DesignTimeOptionsFeature(bool designTime)
public void Configure(RazorParserOptionsBuilder options)
{
options.SetDesignTime(_designTime);
options.UseRoslynTokenizer = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also run razor-toolset-ci pipeline (which tests bunch of existing razor projects) with the roslyn tokenizer enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this before we merge the feature branch and address any breaks that come up with dedicated tests at that point.

@jjonescz jjonescz added the area-compiler Umbrella for all compiler issues label Aug 2, 2024
… to just be from the back of the results, as that's the most common order for the parser to reset in. I've also refactored a common advance loop to reduce duplication.
Copy link
Contributor

@chsienki chsienki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few comments on where things might be clearer + the usual smattering of BOM fun.

case SyntaxKind.LeftParenthesis:
return "(";
case SyntaxKind.RightParenthesis:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pulling the assert cases out into a separate local function to make the actual logic clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure what you mean here. How would that make the logic clearer?

@333fred
Copy link
Member Author

333fred commented Aug 9, 2024

@chsienki @jjonescz @DustinCampbell for reviews please.


using SyntaxToken = Microsoft.AspNetCore.Razor.Language.Syntax.InternalSyntax.SyntaxToken;
using SyntaxFactory = Microsoft.AspNetCore.Razor.Language.Syntax.InternalSyntax.SyntaxFactory;
using CSharpSyntaxKind = Microsoft.CodeAnalysis.CSharp.SyntaxKind;
using CSharpSyntaxToken = Microsoft.CodeAnalysis.SyntaxToken;
using CSharpSyntaxTriviaList = Microsoft.CodeAnalysis.SyntaxTriviaList;
using Microsoft.AspNetCore.Razor.PooledObjects;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this using to the first group.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in a follow up.

@333fred 333fred merged commit 91bbfde into dotnet:features/roslyn-tokenizer Aug 12, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-compiler Umbrella for all compiler issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants