-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor core::char::EscapeDefault and co. structures #105076
Conversation
r? @scottmcm (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
☔ The latest upstream changes (presumably #105671) made this pull request unmergeable. Please resolve the merge conflicts. |
Hello @mina86! I just want to ping you as part of the triage procedure as this PR has merge conflicts :) |
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug} structures from using a state machine to computing escaped sequence upfront and during iteration just going through the characters. This is arguably simpler since it’s easier to think about having a buffer and start..end range to iterate over rather than thinking about a state machine. This also harmonises implementation of aforementioned iterators and core::ascii::EscapeDefault struct. This is done by introducing a new helper EscapeIterInner struct which holds the buffer and offers simple methods for iterating over range. As a side effect, this probably optimises Display implementation for those types since rather than calling write_char repeatedly, write_str is invoked once. On 64-bit platforms, it also reduces size of some of the structs: | Struct | Before | After | |----------------------------+--------+-------+ | core::char::EscapeUnicode | 16 | 12 | | core::char::EscapeDefault | 16 | 12 | | core::char::EscapeDebug | 16 | 16 | My ulterior motive and reason why I started looking into this is addition of as_str method to the iterators. With this change this will became trivial. It’s also going to be trivial to implement DoubleEndedIterator if that’s ever desired.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking a bazillion years to review this. I like the approach of not bothering with an inline state machine for these -- it seems unlikely that people ever want just the first couple bytes without the rest, and just computing them straight-line upfront ought to be net much cheaper than when it's spread over multiple steps.
I've left a bunch of thoughts as I went through, but nothing drastic. Please go through and address them -- either with code changes or by replying to them with why you think the existing is better -- then we can get it landed!
@rustbot author
use crate::num::NonZeroUsize; | ||
use crate::ops::Range; | ||
|
||
const HEX_DIGITS: [u8; 16] = *b"0123456789abcdef"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosity: I see that the old escape_ascii
had hex_digits: &[u8; 16]
. Any idea if using it by value here (instead of the reference) makes any difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It most likely doesn’t matter here. I use array out of habit but I’m not even sure it’s relevant for Rust where everything is compiled statically and LTO is common.
@@ -0,0 +1,97 @@ | |||
//! Helper code for character escaping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pondering: it's not obvious to me that a new top-level module is the right place for this.
Maybe have it be ascii/escape.rs
instead, since it's always escaping things to ascii?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn’t want to put it inside ascii
is because of slight differences in escaping. '\0'.escape_default()
produces \u{0}
while 0u8.escape_default()
produces \x00
(also '\0'.escape_debug()
produces \0
). The way I was thinking about it is that std::ascii::escape_default
and char::escape_default
both use EscapeIterInner
so they are at the same ‘hierarchy’ meaning that EscapeIterInner
shouldn’t be inside std::ascii
.
But yes, I understand your concerns and I’m not completely convinced either. Maybe I’m just overthinking this?
Strangely if I try to update the branch I’m getting: $ ./x.py test library/core
...
error: two packages named `la-arena` in this workspace:
- /srv/mpn/rust/src/tools/rust-analyzer/lib/arena/Cargo.toml
- /srv/mpn/rust/src/tools/rust-analyzer/lib/la-arena/Cargo.toml @rustbot ready |
Thanks! @bors r+ |
Rollup of 7 pull requests Successful merges: - rust-lang#105076 (Refactor core::char::EscapeDefault and co. structures) - rust-lang#108161 (Add `ConstParamTy` trait) - rust-lang#108668 (Stabilize debugger_visualizer) - rust-lang#110512 (Fix elaboration with associated type bounds) - rust-lang#110895 (Remove `all` in target_thread_local cfg) - rust-lang#110955 (uplift `clippy::clone_double_ref` as `suspicious_double_ref_op`) - rust-lang#111048 (Mark`feature(return_position_impl_trait_in_trait)` and`feature(async_fn_in_trait)` as not incomplete) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.
This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.
This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.
As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:
My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.