-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CsvTextFieldFormatter (#18) #19
Conversation
var specialChars = specialCharsCache; | ||
if (specialChars == null) | ||
{ | ||
specialChars = new SortedSet<char> { delimiterChar, quoteChar, quoteEscapeChar, '\n', '\r' }.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, why go via SortedSet instead of simply new char[]{ delimiterChar, quoteChar, quoteEscapeChar, '\n', '\r'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is that the quote and quote escape character are the same character by default. The SortedSet will remove the duplicate.
It's not 100% essential to do that, as the IndexOfAny method won't break if there are duplicates. And there are other ways the duplicates could be avoided, like using the LINQ Distinct method or a more manual check.
{ | ||
get | ||
{ | ||
return new string[] { delimiterChar.ToString(CultureInfo.InvariantCulture) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use ToString(CultureInfo.InvariantCulture) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's to satisfy the .NET Analyzer warning CA1304: Specify CultureInfo.
I did not test it (yet) but looks promising. I added 2 comments. I miss option to escape always (or similar). Reason is that in case leading or trailing white space need to be preserved the spec says it should be quoted. I guess this problem can be solved in 2 ways: But I guess the logic won't be that hard, maybe like: Edit: I see since the parser has a bool TrimWhiteSpace, the bool in the formatter should simply be named PreserveWhiteSpace (leading\trailing implied) |
|
||
private void WriteField(string field) | ||
{ | ||
var isQuoteNecessary = field.IndexOfAny(SpecialChars) >= 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I suggested in the comment:
/// <summary>
/// Some csv consumers trim leading and trailing white space from fields unless they are escaped.
/// </summary>
public bool EscapeWhiteSpace { get; set; }
and
var isQuoteNecessary = field.IndexOfAny(SpecialChars) >= 0 ||
(EscapeWhiteSpace && field.Length > 0 && (char.IsWhiteSpace(field[0]) || char.IsWhiteSpace(field[field.Length - 1])));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been going back and forth on this in my head for a few days, and I think I've landed on preferring an option to always quote over an option that's just about whitespace handling.
My thoughts on why:
The parser in this library matches the behavior of Microsoft's VisualBasic TextFieldParser, and that parser doesn't consider quotes at all when it comes to whitespace trimming. Depending on the TrimWhiteSpace option, it either always preserves whitespace (even in unquoted fields) or always trims whitespace (even if the field is enclosed in quotes).
So if the goal of the formatter is to act as a companion to the parser, I don't think an option to quote for leading or trailing whitespace is a good fit.
However, I do think an option to always quote would be reasonable. The parser already has a "HasFieldsEnclosedInQuotes" option, so it already has a concept of a CSV file never using quotes. A corresponding "ForceFieldsEnclosedinQuotes" option would cover the cases where you want to use the formatter with other parsers that handle leading/trailing whitespace differently, and might even handle other situations where you'd want to require quotes (like maybe a parser collapses whitespace in the middle of a field if it's unquoted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The white space logic is too specific.
Adds a formatter to get a CSV file back from the parser's output.