Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextDocument splits lines on undesired characters #86

Closed
akaihola opened this issue Dec 30, 2024 · 0 comments · Fixed by #87
Closed

TextDocument splits lines on undesired characters #86

akaihola opened this issue Dec 30, 2024 · 0 comments · Fixed by #87
Assignees
Labels
bug Something isn't working

Comments

@akaihola
Copy link
Owner

The TextDocument.lines() property uses str.splitlines. Python documentation for str.splitlines says:

str.splitlines(_keepends\=False_)

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

This method splits on the following line boundaries. In particular, the boundaries are a superset of universal newlines.

Representation Description
\n Line Feed
\r Carriage Return
\r\n Carriage Return + Line Feed
\v or \x0b Line Tabulation
\f or \x0c Form Feed
\x1c File Separator
\x1d Group Separator
\x1e Record Separator
\x85 Next Line (C1 Control Code)
\u2028 Line Separator
\u2029 Paragraph Separator

However, the Python interpreter uses a much narrower definition for end-of-line sequences for physical lines:

A physical line is a sequence of characters terminated by an end-of-line sequence. In source files and strings, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform. The end of input also serves as an implicit terminator for the final physical line.

This causes problems in edge cases, like the IPython test case reported in akaihola/darker#768.

Probably a custom line splitter needs to be implemented and used instead of str.splitlines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

1 participant