Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize string quotes #75

Merged
merged 2 commits into from
Mar 31, 2018
Merged

Normalize string quotes #75

merged 2 commits into from
Mar 31, 2018

Conversation

zsol
Copy link
Collaborator

@zsol zsol commented Mar 25, 2018

Convert simple double-quoted strings to single-quoted. Convert triple (single) quoted strings to triple (double) quoted. Do not touch any strings that have backslashes or quotes inside the string.

Fixes #51.

@carljm
Copy link
Collaborator

carljm commented Mar 25, 2018

FWIW, I am strongly opposed to this. It will take code that currently makes consistent use of quotes (eg a list of human-facing UI text strings that are all double quoted because they are likely to contain single-quote “apostrophes”, even though only some of them actually do), and make it needlessly inconsistent (the strings that happen to contain single quotes will remain double-quoted, the other ones will be converted to single quotes.) In addition to making the code uglier, this hurts maintainability; now I have to manually switch back to double quotes if I rewrite one of the strings in a way that adds a single quote.

I am not at all convinced that changing quoting of strings should even be considered in scope for Black at all. If we really think it is, let’s please think through a more thorough solution before pushing something actively harmful.

@zsol zsol force-pushed the normalize-string branch from 7ad277d to f8d7533 Compare March 25, 2018 19:16
@zsol
Copy link
Collaborator Author

zsol commented Mar 25, 2018

now I have to manually switch back to double quotes if I rewrite one of the strings in a way that adds a single quote.

IMO you should just add a \' to your string, then black should figure this out and convert the string to double quotes.

This discussion is relevant here: prettier/prettier#973

@ambv
Copy link
Collaborator

ambv commented Mar 26, 2018

I commented about the master plan on the issue. Here I will just say, which is relevant to the pull request, that Carl absolutely convinced me not to ship a partial solution in this case.

So, Zsolt, congratulations, you inherited a feature that is a bit bigger than just the first stab 😄

I also think if Black starts doing all three things in one update, it will cause less push back:

  1. change all strings to avoid backslash escapes;
  2. prefer single quotes (but triple double quotes) if no difference;
  3. if two string literals with the same prefix and quote type land on the same line with implicit concatenation, join them.

2 is already done. Now we have to look at 1, using Zsolt's specification table:

input formatted comment
'Hello' unchanged prefer single quotes
"Hello" 'Hello' prefer single quotes
"Don't do that" unchanged never introduce escapes
'Here is a "' unchanged prefer single quotes
'What\'s the deal here?' "What's the deal here?" remove as many escapes as possible
"What's the deal \"here\"?" 'What\'s the deal "here"? remove as many escapes as possible
"And \"here\"?" 'And "here"?' prefer single quotes
"""str with "" in them""" unchanged
'''Here's a ""''' unchanged See ^1
f"{rc} \"files\" reformatted." f'{rc} "files" reformatted.' See ^2
f'MOAR {" ".join([])}' unchanged prefer single quotes
f"MOAR {' '.join([])}" unchanged Tricky to do correctly. See ^3

^1 Note: if a quote touches a closing triple quote of the same kind, it is tokenized as implicit string concatenation. In other words, """Here's a """"" == """Here's a """ + ""

^2 Most f-strings won't embed any quotes within FormattedValues (e.g. the expressions within braces). So unless a quote of the same type appears inside an embedded expression, it's safe to do what we're doing with regular strings.

^3 I am responsible for the crappy fake parsing of f-strings in lib2to3 (e.g. we don't actually parse the embedded expressions). I'm sorry. But it was either that or no f-string support at all. Fortunately, for our purposes, we will only need to find embedded expressions (by matching pairs of non-escaped braces) to see if there are any quotes there. The grammar there is a bit tricky but there's relatively few rules:

>>> f'{2 + 2}'
'4'

>>> f'{{double brace is an escape}}'
'{double brace is an escape}'

>>> f'{ {1:"embedded", 3: "dict"} }'
"{1: 'embedded', 3: 'dict'}"

>>> f'\{2 + 2}'  # backslashes DON'T escape braces in f-strings
<stdin>:1: DeprecationWarning: invalid escape sequence \{
'\\4'

>>> f'\{2 + 2\}'
<stdin>:1: DeprecationWarning: invalid escape sequence \{
  File "<stdin>", line 1
SyntaxError: f-string expression part cannot include a backslash

@ambv ambv force-pushed the master branch 2 times, most recently from eeb3505 to 4b8823e Compare March 26, 2018 09:15
@zsol
Copy link
Collaborator Author

zsol commented Mar 26, 2018

Well I'm glad I'm not going to make things worse :) I expect to look at this again next weekend, stay tuned

@carljm
Copy link
Collaborator

carljm commented Mar 26, 2018

In terms of the test case I mentioned above, I don't think the full proposal outlined here improves anything. I don't want Black to ever turn this well-formatted and consistent code:

messages = {
    MESSAGE.one: "Please click the button.",
    MESSAGE.two: "Please don't look over there.",
    MESSAGE.three: "Stop and smell the roses.",
}

into this inconsistent mess:

messages = {
    MESSAGE.one: 'Please click the button.',
    MESSAGE.two: "Please don't look over there.",
    MESSAGE.three: 'Stop and smell the roses.',
}

because presence or absence of an apostrophe in a given message is an accidental and arbitrary, not inherent, characteristic of the string, and it's highly likely that some string in this list may be changed to include an apostrophe in the future. This is a concrete harm to both consistency and maintainability; the opposite of Black's goal. If a string changes from not having an apostrophe to having one, I either have to change its quoting manually or remember to type an escape, or else I've created a parsing error. And I have to first notice how the string is currently quoted, because Black has made them inconsistent where they were previously consistent. All in all, Black has directly made maintenance of this code noticeably worse than it was before; it's not a matter of aesthetic preference (on an aesthetic level I don't have any preference for one type of quote over another).

This issue would also be largely resolved if Black standardizes on double quotes instead of single. IMO the prevalence of apostrophes alone far outweighs every argument for preferring single quotes; embedded double quotes are orders of magnitude rarer in practice. The extra keystroke on US keyboards is a non-issue; if you find it easier to type a single quote, just type single quotes and let Black convert them.

@ambv
Copy link
Collaborator

ambv commented Mar 26, 2018

Ha, Carl used my own argument against me here... and so I agree with him. Let's default to double quotes always.

@lithammer
Copy link

I thought I would mention that we need to consider PEP 257 (Docstring Convention) as well, specifically this:

For consistency, always use """triple double quotes""" around docstrings.

zsol added 2 commits March 31, 2018 15:54
Convert simple double-quoted strings to single-quoted. Convert triple (single) quoted strings to triple (double) quoted. Do not touch any strings that have backslashes or quotes inside the string.

Fixes psf#51.
@zsol zsol force-pushed the normalize-string branch from f8d7533 to 4878893 Compare March 31, 2018 15:09
@coveralls
Copy link

Pull Request Test Coverage Report for Build 138

  • 152 of 171 (88.89%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 88.342%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tests/test_black.py 55 58 94.83%
black.py 97 113 85.84%
Files with Coverage Reduction New Missed Lines %
black.py 1 87.0%
Totals Coverage Status
Change from base Build 136: 0.2%
Covered Lines: 1220
Relevant Lines: 1381

💛 - Coveralls

1 similar comment
@coveralls
Copy link

Pull Request Test Coverage Report for Build 138

  • 152 of 171 (88.89%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 88.342%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tests/test_black.py 55 58 94.83%
black.py 97 113 85.84%
Files with Coverage Reduction New Missed Lines %
black.py 1 87.0%
Totals Coverage Status
Change from base Build 136: 0.2%
Covered Lines: 1220
Relevant Lines: 1381

💛 - Coveralls

@zsol
Copy link
Collaborator Author

zsol commented Mar 31, 2018

I pushed a new implementation that tries to minimize escaping while preferring " and """. A couple of caveats:

  • I punted on reformatting strings nested inside f-strings for now. I think we can add this on later, let's agree on the simple(r?) cases first
  • The code doesn't force """ for docstrings, but will lean towards them if it causes less escaping. This docstring is considered well-formatted, for example: '''Docstring "''', instead of reformatting it to """Docstring \""""

I tried adding important edge cases to the test case, let me know if I missed anything!

@ambv ambv merged commit 80bd2b3 into psf:master Mar 31, 2018
@ambv
Copy link
Collaborator

ambv commented Mar 31, 2018

This is great. Thanks! ✨ 🍰 ✨

We'll iterate on it if anything comes up but looks good. Now for some documentation and README updates.

@alanhamlett
Copy link

Don't use single quote inside human readable strings, use instead. Problem solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants