Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LF line-ending auto format bug #10802

Merged
merged 12 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,10 @@ public async Task<TextEdit[]> FormatAsync(
var filteredEdits = range is null
? result.Edits
: result.Edits.Where(e => range.LineOverlapsWith(e.Range));
var minimalEdits = GetMinimalEdits(originalText, filteredEdits);
minimalEdits = NormalizeLineEndings(originalText, minimalEdits);
jordi1215 marked this conversation as resolved.
Show resolved Hide resolved

return GetMinimalEdits(originalText, filteredEdits);
return minimalEdits;
}

private static TextEdit[] GetMinimalEdits(SourceText originalText, IEnumerable<TextEdit> filteredEdits)
Expand Down Expand Up @@ -195,7 +197,6 @@ internal static TextEdit MergeEdits(TextEdit[] edits, SourceText sourceText)
{
return edits[0];
}

var textChanges = new List<TextChange>();
foreach (var edit in edits)
{
Expand Down Expand Up @@ -240,4 +241,48 @@ private static void UnwrapCSharpSnippets(TextEdit[] snippetEdits)
snippetEdit.NewText = unwrappedText;
}
}

// This method counts the occurrences of CRLF and LF line endings in the original text. If LF line endings are more
// prevalent, it removes any CR characters from the text edits to ensure consistency with the LF style.
// This can be removed once we figure out how to get proper line ending information from the client.
private TextEdit[] NormalizeLineEndings(SourceText originalText, TextEdit[] minimalEdits)
{
var (crlfCount, lfCount) = CountLineEndings(originalText);
if (lfCount > crlfCount)
jordi1215 marked this conversation as resolved.
Show resolved Hide resolved
{
minimalEdits = minimalEdits
.Select(edit =>
{
edit.NewText = edit.NewText.Replace("\r", string.Empty);
return edit;
})
.ToArray();
jordi1215 marked this conversation as resolved.
Show resolved Hide resolved
}

return minimalEdits;
}

private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This method could just return a bool and be called HasLFLineEndings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of thoughts/questions about this method:

  1. Has SourceText.Lines already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through the Lines collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in the SourceText.
  2. Should we be concerned about other legal "new line" characters? Here are the others that I know of:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions and make it an extension method on SourceText.

{
var crlfCount = 0;
var lfCount = 0;

for (var i = 0; i < sourceText.Length; i++)
{
if (sourceText[i] == '\r')
{
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
{
crlfCount++;
i++; // Skip the next character as it's part of the CRLF sequence
}
}
else if (sourceText[i] == '\n')
{
lfCount++;
}
}

return (crlfCount, lfCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an algorithm like this, I might do a few things differently to improve the efficiency. You can take these suggestions or leave them! Or, you can take an entirely different approach! I just wanted to share a couple of opportunities:

  1. I would extract SourceText.Length to a local variable to avoid accessing it each iteration of the loop. SourceText.Length is an abstract property and is overridden by every SourceText implementation. So, I wouldn't expect the JIT to do any smart inlining here.
  2. I would capture the first character before the loop and then start the loop at index 1. That way, I could avoid the extra length check inside the loop for \r.
  3. I would likely extract sourceText[i] to a local variable or use a switch statement to avoid indexing extra times. This allows you to walk characters in pairs -- the previous and current character.
  4. This doesn't help with efficiency so much, but C# tuples are mutable structs. At the method body level, it can be useful to think of them as little packs of variables. So, you could just declare a single local at the top of the method for the result tuple and just update the fields. This won't affect efficiency and is largely a stylistic choice in this case, but I thought it might be useful to mention.
Suggested change
var crlfCount = 0;
var lfCount = 0;
for (var i = 0; i < sourceText.Length; i++)
{
if (sourceText[i] == '\r')
{
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
{
crlfCount++;
i++; // Skip the next character as it's part of the CRLF sequence
}
}
else if (sourceText[i] == '\n')
{
lfCount++;
}
}
return (crlfCount, lfCount);
var result = (crlfCount: 0, lfCount: 0);
var length = sourceText.Length;
if (length == 0)
{
return result;
}
var previous = sourceText[0];
if (previous == '\n')
{
result.lfCount++;
}
for (var i = 1; i < length; i++)
{
var current = sourceText[i];
if (current == '\n')
{
if (previous == '\r')
{
result.crlfCount++;
// Skip ahead to avoid counting the '\n' again during the next iteration.
// However, we need to be careful not to index past the end of the SourceText!
if (++i < length)
{
// Set previous to the character *after* current. And, since we've already
// set previous, we can continue the loop. Otherwise, previous will get set
// to the wrong value below.
previous = sourceText[i];
continue;
}
}
else
{
result.lfCount++;
}
}
previous = current;
}
return result;

As I mentioned above, it might just be simpler (and more efficient) to walk through SourceText.Lines. However, I wanted to provide some suggestions in case you keep this algorithm. Also, please note that I have not tested this code! I wrote it directly into the GitHub comment field. So, please take the accuracy with a grain of salt. I've only had one cup of coffee this morning. 😄

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,44 @@ public interface Bar
""");
}

[Fact]
public async Task FormatsCodeBlockDirectiveWithCRLFLineEnding()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are great, and I was going to ask you to create a version of the Formats_CodeBlockDirectiveWithMarkup_NonBraced test as well (because it adds a newline with trailing whitespace, which these don't cover), but then I thought about it more, and I think it would be more interesting if FormattingTestBase was modified to replace \r\n with \n, and we run every test with both line endings.

ie, duplicate these two lines again, but with input and expected having had their line endings replaced with LF
https://github.com/dotnet/razor/blob/main/src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs#L54

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense! That would be a better way to widen the test coverage

{
await RunFormattingTestAsync(
input:
"@code {\r\n" +
" public class Foo{}\r\n" +
" public interface Bar {\r\n" +
"}\r\n" +
"}",
expected:
"@code {\r\n" +
" public class Foo { }\r\n" +
" public interface Bar\r\n" +
" {\r\n" +
" }\r\n" +
"}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding test helpers to add the line breaks for you. Something like these would make it a little less error prone to write new tests.

public static string CodeFromLines(string lineEnding, params string[] lines)
{
    return string.Join(lineEnding, lines);
}

Then, the tests can be written by passing in the line text without line endings like so:

Suggested change
await RunFormattingTestAsync(
input:
"@code {\r\n" +
" public class Foo{}\r\n" +
" public interface Bar {\r\n" +
"}\r\n" +
"}",
expected:
"@code {\r\n" +
" public class Foo { }\r\n" +
" public interface Bar\r\n" +
" {\r\n" +
" }\r\n" +
"}");
await RunFormattingTestAsync(
input: CodeFromLines("\r\n",
"@code {",
" public class Foo{}",
" public interface Bar {",
"}",
"}"),
expected: CodeFromLines("\r\n",
"@code {",
" public class Foo { }",
" public interface Bar",
" {",
" }",
"}");

Even better, since this is test code, efficiency is less worrisome than in production code. So, you could write a helper that uses a SourceText to parse the lines and use that to build up a new string with different line endings. Something like the following would do the trick (Note: untested code!).

public static string NormalizeLineEndings(string lineEnding, string code)
{
    using var _ = StringBuilderPool.GetPooledObject(out var builder);

    var text = SourceText.From(code);

    foreach (var line in text.Lines)
    {
        builder.Append(text.GetSubTextString(line.Span));

        if (line.End < text.Length)
        {
            builder.Append(lineEnding);
        }
    }

    return builder.ToString();
}

Then, the tests could continue using raw string literals like so:

Suggested change
await RunFormattingTestAsync(
input:
"@code {\r\n" +
" public class Foo{}\r\n" +
" public interface Bar {\r\n" +
"}\r\n" +
"}",
expected:
"@code {\r\n" +
" public class Foo { }\r\n" +
" public interface Bar\r\n" +
" {\r\n" +
" }\r\n" +
"}");
await RunFormattingTestAsync(
input: NormalizeLineEndings("\r\n", """
@code {
public class Foo{}
public interface Bar {
}
}
""",
expected: NormalizeLineEndings("\r\n", """
@code {
public class Foo { }
public interface Bar
{
}
}
""");

Even, even better, the test helper could be extension method on string.

public static string ChangeLineEndingsTo(this string code, string lineEnding);

}

[Fact]
public async Task FormatsCodeBlockDirectiveWithLFLineEnding()
{
await RunFormattingTestAsync(
input:
"@code {\n" +
" public class Foo{}\n" +
" public interface Bar {\n" +
"}\n" +
"}",
expected:
"@code {\n" +
" public class Foo { }\n" +
" public interface Bar\n" +
" {\n" +
" }\n" +
"}");
}

[Fact]
public async Task FormatCSharpInsideHtmlTag()
{
Expand Down Expand Up @@ -149,6 +187,64 @@ void Method()
""");
}

[Fact]
public async Task Formats_MultipleBlocksInADirectiveWithCRLFLineEnding()
{
await RunFormattingTestAsync(
input:
"@{\r\n" +
"void Method(){\r\n" +
"var x = \"foo\";\r\n" +
"@(DateTime.Now)\r\n" +
" <p></p>\r\n" +
"var y= \"fooo\";\r\n" +
"}\r\n" +
"}\r\n" +
"<div>\r\n" +
" </div>",
expected:
"@{\r\n" +
" void Method()\r\n" +
" {\r\n" +
" var x = \"foo\";\r\n" +
" @(DateTime.Now)\r\n" +
" <p></p>\r\n" +
" var y = \"fooo\";\r\n" +
" }\r\n" +
"}\r\n" +
"<div>\r\n" +
"</div>");
}

[Fact]
public async Task Formats_MultipleBlocksInADirectiveWithLFLineEnding()
{
await RunFormattingTestAsync(
input:
"@{\n" +
"void Method(){\n" +
"var x = \"foo\";\n" +
"@(DateTime.Now)\n" +
" <p></p>\n" +
"var y= \"fooo\";\n" +
"}\n" +
"}\n" +
"<div>\n" +
" </div>",
expected:
"@{\n" +
" void Method()\n" +
" {\n" +
" var x = \"foo\";\n" +
" @(DateTime.Now)\n" +
" <p></p>\n" +
" var y = \"fooo\";\n" +
" }\n" +
"}\n" +
"<div>\n" +
"</div>");
}

[Fact]
public async Task Formats_NonCodeBlockDirectives()
{
Expand Down
Loading