Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format hex code in unicode escape sequences in string literals #2916

Merged
merged 24 commits into from
Jan 22, 2023

Conversation

Shivansh-007
Copy link
Contributor

Closes #2067
Closes #2828

Checklist - did you ...

  • Add a CHANGELOG entry if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation? -> n/a

Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR once again! A comment and some nits below 👍 Let's discuss.

src/black/linegen.py Outdated Show resolved Hide resolved
src/black/mode.py Outdated Show resolved Hide resolved


def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
"""Replace hex codes in Unicode escape sequences with lowercase representation."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to be thought out still, as this comment points out. My two cents: I prefer upper case, and since Black formats hex numbers to upper already I think it would be consistent. The Python repr argument is solid too, but we should think about changing hex literals as well then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not change hex numbers, we already changed our mind there a few times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we're not changing numbers (which I agree with), do y'all share the concern for consistency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments read a bit ambiguously. So to be clear, I'm proposing that we switch the formatting to be upper case to be consistent with hex numbers. Y'all in?

src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved


def normalize_unicode_escape_sequences(leaf: Leaf) -> None:
"""Replace hex codes in Unicode escape sequences with lowercase representation."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not change hex numbers, we already changed our mind there a few times.

test.py Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Mar 16, 2022

diff-shades results comparing this PR (1511959) to main (4e3303f). The full diff is available in the logs under the "Generate HTML diff report" step.

╭──────────────────────── Summary ────────────────────────╮
│ 5 projects & 38 files changed / 290 changes [+145/-145] │
│                                                         │
│ ... out of 2 363 850 lines, 11 046 files & 23 projects  │
╰─────────────────────────────────────────────────────────╯

Differences found.

What is this? | Workflow run | diff-shades documentation

src/black/strings.py Outdated Show resolved Hide resolved
@JelleZijlstra JelleZijlstra self-assigned this Mar 24, 2022
Co-authored-by: Jelle Zijlstra <[email protected]>
Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't comment on the actual formatting style, but I got quite a few other suggestions. Not sure if this is too minor, but I'd recommend checking this is covered in the Black code style documentation!

Thanks again!

src/black/strings.py Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
src/black/strings.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mark my review as "request changes" which is relevant since this PR can still crash.

@JelleZijlstra JelleZijlstra removed their assignment Apr 2, 2022
@ichard26
Copy link
Collaborator

Hi @Shivansh-007, are you still able to and interested in working on this PR? If not, just lemme know and I'd be happy to pick it up!

@JelleZijlstra JelleZijlstra removed help wanted Extra attention is needed S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. labels Dec 18, 2022
@JelleZijlstra JelleZijlstra self-assigned this Dec 18, 2022
@JelleZijlstra
Copy link
Collaborator

I brought this PR up to date, applied @ichard26's review suggestions, and fixed a few more things I noticed. I think this PR is now good to go unless we change our mind to go with uppercase (#2067).

@JelleZijlstra
Copy link
Collaborator

I determined the legal characters in \N escapes by doing something like [unicodedata.name(chr(i)) for i in range(65536)] (but ignoring invalid characters) and taking the set of all characters in the output. The length of the names ranged from 3 to 83. However, \N also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂. I manually verified that there are no one-character aliases.

@Jackenmen
Copy link
Contributor

However, \N also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂

"ox" is the base name for 🐂 so it's returned by unicodedata.name().

@JelleZijlstra
Copy link
Collaborator

Ah thanks, I should have gone past 65536 to include astral characters. That increases the length range from 2 to 88 but doesn't add more characters to the set of characters that appear in names.

@JelleZijlstra
Copy link
Collaborator

Also the longest names are

In [13]: [n for n in names if len(n) > 80]
Out[13]: 
['ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM',
 'ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT AND MIDDLE RIGHT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT AND MIDDLE LEFT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE TO MIDDLE LEFT',
 'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE TO MIDDLE RIGHT',
 'BOX DRAWINGS LIGHT DIAGONAL MIDDLE LEFT TO UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE',
 'BOX DRAWINGS LIGHT DIAGONAL MIDDLE RIGHT TO UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE']

@JelleZijlstra JelleZijlstra merged commit eabff67 into psf:main Jan 22, 2023
copybara-service bot pushed a commit to google/pyink that referenced this pull request Feb 6, 2023
Noticeable style changes:

1. Parenthesize multiple context managers psf#3489.

The following style changes are temporarily disabled when `--preview` is used together with `--pyink`:

2. Format unicode escape sequences psf#2916.
3. Parenthesize conditional expressions psf#2278.

PiperOrigin-RevId: 507485670
luketainton pushed a commit to luketainton/PwnedPW that referenced this pull request Feb 10, 2025
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [black](https://github.com/psf/black) ([changelog](https://github.com/psf/black/blob/main/CHANGES.md)) | dependency-groups | major | `<25.0.0,>=24.10.0` -> `<25.2.0,>=25.1.0` |

---

### Release Notes

<details>
<summary>psf/black (black)</summary>

### [`v25.1.0`](https://github.com/psf/black/blob/HEAD/CHANGES.md#2510)

[Compare Source](psf/black@24.10.0...25.1.0)

##### Highlights

This release introduces the new 2025 stable style ([#&#8203;4558](psf/black#4558)), stabilizing
the following changes:

-   Normalize casing of Unicode escape characters in strings to lowercase ([#&#8203;2916](psf/black#2916))
-   Fix inconsistencies in whether certain strings are detected as docstrings ([#&#8203;4095](psf/black#4095))
-   Consistently add trailing commas to typed function parameters ([#&#8203;4164](psf/black#4164))
-   Remove redundant parentheses in if guards for case blocks ([#&#8203;4214](psf/black#4214))
-   Add parentheses to if clauses in case blocks when the line is too long ([#&#8203;4269](psf/black#4269))
-   Whitespace before `# fmt: skip` comments is no longer normalized ([#&#8203;4146](psf/black#4146))
-   Fix line length computation for certain expressions that involve the power operator ([#&#8203;4154](psf/black#4154))
-   Check if there is a newline before the terminating quotes of a docstring ([#&#8203;4185](psf/black#4185))
-   Fix type annotation spacing between `*` and more complex type variable tuple ([#&#8203;4440](psf/black#4440))

The following changes were not in any previous release:

-   Remove parentheses around sole list items ([#&#8203;4312](psf/black#4312))
-   Generic function definitions are now formatted more elegantly: parameters are
    split over multiple lines first instead of type parameter definitions ([#&#8203;4553](psf/black#4553))

##### Stable style

-   Fix formatting cells in IPython notebooks with magic methods and starting or trailing
    empty lines ([#&#8203;4484](psf/black#4484))
-   Fix crash when formatting `with` statements containing tuple generators/unpacking
    ([#&#8203;4538](psf/black#4538))

##### Preview style

-   Fix/remove string merging changing f-string quotes on f-strings with internal quotes
    ([#&#8203;4498](psf/black#4498))
-   Collapse multiple empty lines after an import into one ([#&#8203;4489](psf/black#4489))
-   Prevent `string_processing` and `wrap_long_dict_values_in_parens` from removing
    parentheses around long dictionary values ([#&#8203;4377](psf/black#4377))
-   Move `wrap_long_dict_values_in_parens` from the unstable to preview style ([#&#8203;4561](psf/black#4561))

##### Packaging

-   Store license identifier inside the `License-Expression` metadata field, see
    [PEP 639](https://peps.python.org/pep-0639/). ([#&#8203;4479](psf/black#4479))

##### Performance

-   Speed up the `is_fstring_start` function in Black's tokenizer ([#&#8203;4541](psf/black#4541))

##### Integrations

-   If using stdin with `--stdin-filename` set to a force excluded path, stdin won't be
    formatted. ([#&#8203;4539](psf/black#4539))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xNjQuMSIsInVwZGF0ZWRJblZlciI6IjM5LjE2NC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJsaW50aW5nIl19-->

Reviewed-on: https://git.tainton.uk/repos/PwnedPW/pulls/283
Reviewed-by: Luke Tainton <[email protected]>
Co-authored-by: Renovate [BOT] <[email protected]>
Co-committed-by: Renovate [BOT] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: strings Related to our handling of strings T: style What do we want Blackened code to look like?
Projects
None yet
5 participants