`TextDocument` splits lines on undesired characters #86

akaihola · 2024-12-30T17:47:28Z

The TextDocument.lines() property uses str.splitlines. Python documentation for str.splitlines says:

str.splitlines(_keepends\=False_)

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

This method splits on the following line boundaries. In particular, the boundaries are a superset of universal newlines.

Representation Description

\n Line Feed

\r Carriage Return

\r\n Carriage Return + Line Feed

\v or \x0b Line Tabulation

\f or \x0c Form Feed

\x1c File Separator

\x1d Group Separator

\x1e Record Separator

\x85 Next Line (C1 Control Code)

\u2028 Line Separator

\u2029 Paragraph Separator

However, the Python interpreter uses a much narrower definition for end-of-line sequences for physical lines:

A physical line is a sequence of characters terminated by an end-of-line sequence. In source files and strings, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform. The end of input also serves as an implicit terminator for the final physical line.

This causes problems in edge cases, like the IPython test case reported in akaihola/darker#768.

Probably a custom line splitter needs to be implemented and used instead of str.splitlines.

The text was updated successfully, but these errors were encountered:

akaihola added the bug Something isn't working label Dec 30, 2024

akaihola added this to the Darkgraylib 2.1.1 milestone Dec 30, 2024

akaihola self-assigned this Dec 30, 2024

akaihola added this to Darker and Graylint development Dec 30, 2024

This was referenced Dec 30, 2024

darker.verification.NotEquivalentError: nonascii2.py (IPython test file) akaihola/darker#768

Closed

Split newlines only at Python universal newlines (LF, CRLF, CR) #87

Merged

akaihola closed this as completed in #87 Jan 7, 2025

github-project-automation bot moved this to Done in Darker and Graylint development Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TextDocument` splits lines on undesired characters #86

`TextDocument` splits lines on undesired characters #86

akaihola commented Dec 30, 2024

TextDocument splits lines on undesired characters #86

TextDocument splits lines on undesired characters #86

Comments

akaihola commented Dec 30, 2024

`TextDocument` splits lines on undesired characters #86

`TextDocument` splits lines on undesired characters #86