Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changing the newline sequence for indenting using Utf8JsonWriter #54715

Open
SteveKCheng opened this issue Jun 25, 2021 · 5 comments
Open
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@SteveKCheng
Copy link

SteveKCheng commented Jun 25, 2021

Background and Motivation

Currently the newline sequence used by Utf8JsonWriter depends on the running platform, obtained from Environment.NewLine.

It would be useful to be able to set it explicitly so that JSON output can be reproduced exactly no matter what platform the .NET program is running under. It is essential when the JSON output bytes are to be compared directly or through a hash, e.g. as a digital signature or for regression-testing expected outputs.

Proposed API

namespace System.Text.Json
{
    public struct JsonWriterOptions
    {
+        public string? NewLine { get; set; }
    }
}

The default value is null (backwards compatible), which means to take the value from Environment.NewLine.
Otherwise the specified string is used as the newline sequence, as in the TextWriter.NewLine property.

Usage Examples

using var writer = new Utf8JsonWriter(outputStream, new JsonWriterOptions
{
    Indented = true,
    NewLine = "\n"
};

Alternatives

Currently as Utf8JsonWriter does not offer this functionality, I have to use Newtonsoft's JSON writer instead --- which relies on TextWriter that has a customizable NewLine property. It seems fairly difficult, and obviously inefficient, to write a "filter" on a Stream to convert the platform-specific newline byte sequence.

Risks

I think it is low risk for the most common cases. There is no change to behavior of existing code that does not set the JsonWriterOptions.NewLine property. Utf8JsonWriter's code is already parameterized on Environment.NewLine and this proposal is just asking we stop hard-coding the use of a global property.

What if the arbitrary newline sequence contains non-ASCII characters? Or it results in invalid JSON?

My first thought is that such characters should still be supported for completeness (otherwise we'd have to define what the behavior would be) even they might cause slightly slower execution. And it would be consistent with TextWriter's API with no friction for the user.

But perhaps an enumeration (or boolean?) allowing only for the standard choices of newline sequences would work too.
Currently, Utf8JsonWriter's code assumes Environment.NewLine is either "\r\n" or "\n". Supporting the choice between these two would be good enough to satisfy the motivation above.

@SteveKCheng SteveKCheng added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 25, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jun 25, 2021
@ghost
Copy link

ghost commented Jun 25, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

Currently the newline sequence used by Utf8JsonWriter depends on the running platform, obtained from Environment.NewLine.

It would be useful to be able to set it explicitly so that JSON output can be reproduced exactly no matter what platform the .NET program is running under. It is essential when the JSON output bytes are to be compared directly or through a hash, e.g. as a digital signature or for regression-testing expected outputs.

Proposed API

namespace System.Text.Json
{
    public struct JsonWriterOptions
    {
+        public string? NewLine { get; set; }
    }
}

The default value is null (backwards compatible), which means to take the value from Environment.NewLine.
Otherwise the specified string is used as the newline sequence, as in the TextWriter.NewLine property.

Usage Examples

using var writer = new Utf8JsonWriter(outputStream, new JsonWriterOptions
{
    Indented = true,
    NewLine = "\n"
};

Alternatives

Currently as Utf8JsonWriter does not offer this functionality, I have to use Newtonsoft's JSON writer instead --- which relies on TextWriter which has a customizable NewLine property. It seems fairly difficult, and obviously inefficient, to try to write a "filter" on a Stream to convert the platform-specific newline byte sequence.

Risks

I think it is low risk for the most common cases. There is no change to behavior of existing code that does not set the JsonWriterOptions.NewLine property. Utf8JsonWriter's code is already parameterized on Environment.NewLine and this proposal is just asking we stop hard-coding the use of a global property.

What if the newline sequence contains non-ASCII characters? I think such characters should still be supported for completeness (otherwise we'd have to define what the behavior would be) even they might cause slightly slower execution. An enumeration could be used instead, allowing only for the standard choices of newline sequences, but an enumeration is rather noisy (I think in this case) and it would not be consistent with the existing API on TextWriter.

Author: SteveKCheng
Assignees: -
Labels:

api-suggestion, area-System.Text.Json, untriaged

Milestone: -

@GrabYourPitchforks
Copy link
Member

I don't have any feedback on the customizing newline scenario - will leave that for the JSON team. But I wanted to focus on this comment:

It is essential when the JSON output bytes are to be compared directly or through a hash, e.g. as a digital signature or for regression-testing expected outputs.

Please be advised that different major versions of System.Text.Json are not guaranteed to have byte-for-byte identical output with one another. This could be because the .NET runtime itself changes how some values are encoded (like how .NET Framework and .NET 5 have different string representations for double), it could be because characters like '€' may or may not be escaped by default, or it could be some other factor.

If you rely on the contents having a very specific byte-for-byte representation, please always test each new version of System.Text.Json (or its dependencies) or each new version of the .NET runtime in an integration environment before deploying to production. This will allow you to catch such discrepancies early.

Hope this helps!

@SteveKCheng
Copy link
Author

@GrabYourPitchforks
Just want to comment: I am fine with the caveat that byte-for-byte comparison only works with the same version of the run-time & libraries. The use case I have right now is regression-testing by comparing expected outputs where some developers might run the program on Windows while the automated periodic process is running on Linux.

@eiriktsarpalis eiriktsarpalis added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jul 16, 2021
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Jul 16, 2021
@eiriktsarpalis eiriktsarpalis added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Oct 15, 2021
@eiriktsarpalis eiriktsarpalis removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Nov 16, 2021
@jeffhandley
Copy link
Member

@SteveKCheng We know that we aren't going to get to this feature during .NET 7, and compared to many of the other issues/features in our System.Text.Json backlog, I don't expect we'll get to this for the foreseeable future. To reflect this, I'm going to close the issue.

For the regression testing scenario you described, I recommend normalizing the newlines before comparison. As was previously called out, other direct comparisons or consumption for a hash, a digital signature, or other purpose would not be reliable without normalization either.

@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2022
@jeffhandley
Copy link
Member

Reopening based on additional feedback from @damieng that there are scenarios where additional control of the written content is valuable.

@damieng if you could, please add a comment citing your scenarios so that we can consider them and if we proceed on this use them as acceptance criteria.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

4 participants