-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ASCII control chars with Unicode Control Pictures #127528
Conversation
If we're concerned by legibility or by replacing with characters that might not be well supported in all terminals, we can also replace these with something like
|
This comment has been minimized.
This comment has been minimized.
0089595
to
4db25f7
Compare
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
☔ The latest upstream changes (presumably #127777) made this pull request unmergeable. Please resolve the merge conflicts. |
4db25f7
to
42ed400
Compare
☔ The latest upstream changes (presumably #127819) made this pull request unmergeable. Please resolve the merge conflicts. |
``` error: bare CR not allowed in doc-comment --> $DIR/lex-bare-cr-string-literal-doc-comment.rs:3:32 | LL | /// doc comment with bare CR: '␍' | ^ ```
42ed400
to
ac6eb65
Compare
No longer track "zero-width" chars in `SourceMap`, read directly from the line when calculating the `display_col` of a `BytePos`. Move `char_width` to `rustc_span` and use it from the emitter. This change allows the following to properly align in terminals (depending on the font, the replaced control codepoints are rendered as 1 or 2 width, on my terminal they are rendered as 1, on VSCode text they are rendered as 2): ``` error: this file contains an unclosed delimiter --> $DIR/issue-68629.rs:5:17 | LL | ␜␟ts␀![{i | -- unclosed delimiter | | | unclosed delimiter LL | ␀␀ fn rݻoa>rݻm | ^ ```
We already point these out quite aggressively, telling people not to use them, but would normally be rendered as nothing. Having them visible will make it easier for people to actually deal with them. ``` error: unicode codepoint changing visible direction of text present in literal --> $DIR/unicode-control-codepoints.rs:26:22 | LL | println!("{:?}", '�'); | ^-^ | || | |'\u{202e}' | this literal contains an invisible unicode text flow control codepoint | = note: these kind of unicode codepoints change the way text flows on applications that support them, but can cause confusion because they change the order of characters on the screen = help: if their presence wasn't intentional, you can remove them help: if you want to keep them but make them visible in your source code, you can escape them | LL | println!("{:?}", '\u{202e}'); | ~~~~~~~~ ``` vs the previous ``` error: unicode codepoint changing visible direction of text present in literal --> $DIR/unicode-control-codepoints.rs:26:22 | LL | println!("{:?}", ''); | ^- | || | |'\u{202e}' | this literal contains an invisible unicode text flow control codepoint | = note: these kind of unicode codepoints change the way text flows on applications that support them, but can cause confusion because they change the order of characters on the screen = help: if their presence wasn't intentional, you can remove them help: if you want to keep them but make them visible in your source code, you can escape them | LL | println!("{:?}", '\u{202e}'); | ~~~~~~~~ ```
aa4f805
to
9dffe95
Compare
error: unknown character escape: `\r` | ||
--> $DIR/trailing-carriage-return-in-string.rs:10:25 | ||
| | ||
LL | let bad = "This is \ a test"; | ||
LL | let bad = "This is \␍ a test"; | ||
| ^ unknown character escape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing, because the problem is that \
escape is followed by a non-printable character. This is parsed as \␍
, which is kind-of \\r
.
@bors rollup |
…i-obk Replace ASCII control chars with Unicode Control Pictures Replace ASCII control chars like `CR` with Unicode Control Pictures like `␍`: ``` error: bare CR not allowed in doc-comment --> $DIR/lex-bare-cr-string-literal-doc-comment.rs:3:32 | LL | /// doc comment with bare CR: '␍' | ^ ``` Centralize the checking of unicode char width for the purposes of CLI display in one place. Account for the new replacements. Remove unneeded tracking of "zero-width" unicode chars, as we calculate these in the `SourceMap` as needed now.
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#126548 (Improved clarity of documentation for std::fs::create_dir_all) - rust-lang#127528 (Replace ASCII control chars with Unicode Control Pictures) - rust-lang#127717 (Fix malformed suggestion for repeated maybe unsized bounds) - rust-lang#128046 (Fix some `#[cfg_attr(not(doc), repr(..))]`) - rust-lang#128122 (Mark `missing_fragment_specifier` as `FutureReleaseErrorReportInDeps`) - rust-lang#128135 (std: use duplicate thread local state in tests) - rust-lang#128140 (Remove Unnecessary `.as_str()` Conversions) r? `@ghost` `@rustbot` modify labels: rollup
@bors r=oli-obk |
…iaskrgr Rollup of 5 pull requests Successful merges: - rust-lang#127054 (Reorder trait bound modifiers *after* `for<...>` binder in trait bounds) - rust-lang#127528 (Replace ASCII control chars with Unicode Control Pictures) - rust-lang#127872 (Migrate `pointer-auth-link-with-c`, `c-dynamic-rlib` and `c-dynamic-dylib` `run-make` tests to rmake) - rust-lang#128111 (Do not use question as label) - rust-lang#128160 (Don't ICE when auto trait has assoc ty in old solver) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#127528 - estebank:ascii-control-chars, r=oli-obk Replace ASCII control chars with Unicode Control Pictures Replace ASCII control chars like `CR` with Unicode Control Pictures like `␍`: ``` error: bare CR not allowed in doc-comment --> $DIR/lex-bare-cr-string-literal-doc-comment.rs:3:32 | LL | /// doc comment with bare CR: '␍' | ^ ``` Centralize the checking of unicode char width for the purposes of CLI display in one place. Account for the new replacements. Remove unneeded tracking of "zero-width" unicode chars, as we calculate these in the `SourceMap` as needed now.
@rust-timer build e3343bd |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e3343bd): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -3.2%, secondary -6.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.2%, secondary 1.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (secondary -0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 771.74s -> 769.585s (-0.28%) |
This change seems to have a (hopefully unintended) side effect: at least rust-lld on windows seem to end lines with CRLF, so every line of the rust-lld output ends with an ugly |
@bugadani I'd noticed that in one test but didn't think too much of it at the time. Yes, we should clean that up. |
…felix Change output normalization logic to be linear against size of output Modify the rendered output normalization routine to scan each character *once* and construct a `String` to be printed out to the terminal *once*, instead of using `String::replace` in a loop multiple times. The output doesn't change, but the time spent to prepare a diagnostic is now faster (or rather, closer to what it was before rust-lang#127528).
…felix Change output normalization logic to be linear against size of output Modify the rendered output normalization routine to scan each character *once* and construct a `String` to be printed out to the terminal *once*, instead of using `String::replace` in a loop multiple times. The output doesn't change, but the time spent to prepare a diagnostic is now faster (or rather, closer to what it was before rust-lang#127528).
Replace ASCII control chars like
CR
with Unicode Control Pictures like␍
:Centralize the checking of unicode char width for the purposes of CLI display in one place. Account for the new replacements. Remove unneeded tracking of "zero-width" unicode chars, as we calculate these in the
SourceMap
as needed now.