-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for UTF-16 decoding iterators #27830
Comments
🔔 This issue is now entering its cycle-long final comment period for stabilization in 1.9 🔔 |
My personal opinion: Ok, so the APIs that this is talking about are:
I would not personally be a huge fan of stabilizing this cycle, some concerns being:
I think I'd be more comfortable with stabilizing given some utf-8 decoding functions as well, but it's probably worth also looking at the matrix of conversions we have:
I guess in that sense we'd be "complete" with |
I don't see this as a concern, really. The more we work with iterators, the more natural things like "iterator transformers" such as this are. It's a fine ball to get rolling in my opinion.
Agreed.
Perhaps so -- but IMO this should influence organization more than anything. That is, we might want to think about a submodule for constants, if we do anticipate adding more over time. Most of the other points have the flavor of: why stabilize just this one piece? I agree that I'd really like to have an overall vision here; I feel like every cycle we stabilize a couple of related methods. That said, your matrix is pretty useful, and does indeed suggest we should land both this and an analogous utf8 decoder. |
I would be in favor of postponing until we have a more complete vision here as well (I'm a consumer of UTF-8 functions like this, which I typically just implement in-crate). There may be other routines we want to consider as well, for example, decoding a UTF-8 sequence in reverse can often be useful. |
Since
We can remove the constant if lossy decoding is built-in.
Yes, I do think we’re missing something lower-level than we currently have in But since I have some experiments at https://github.com/SimonSapin/rust-utf8. (In you’re interested, the commit history shows a number of different APIs I tried.) It supports "incremental" lossy decoding: input is a number of But this is significantly more API surface than, say, an iterator adaptor. And there’s probably a wide variety of use cases with slightly different constraints (@BurntSushi mentioned decoding in reverse), so I don’t know if it makes sense to try and support all of them in Still, it’d be nice to have a single UTF-8 decoding primitive |
@SimonSapin Would |
@BurntSushi Building something that yields |
The libs team discussed this during triage yesterday and the conclusion was to stabilize essentially everything as-is modulo changing the error returned by the iterator. We felt that there's room for decoding an iterator of u8 to char, but we can always add that later. |
This commit applies all stabilizations, renamings, and deprecations that the library team has decided on for the upcoming 1.9 release. All tracking issues have gone through a cycle-long "final comment period" and the specific APIs stabilized/deprecated are: Stable * `std::panic` * `std::panic::catch_unwind` (renamed from `recover`) * `std::panic::resume_unwind` (renamed from `propagate`) * `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`) * `std::panic::UnwindSafe` (renamed from `RecoverSafe`) * `str::is_char_boundary` * `<*const T>::as_ref` * `<*mut T>::as_ref` * `<*mut T>::as_mut` * `AsciiExt::make_ascii_uppercase` * `AsciiExt::make_ascii_lowercase` * `char::decode_utf16` * `char::DecodeUtf16` * `char::DecodeUtf16Error` * `char::DecodeUtf16Error::unpaired_surrogate` * `BTreeSet::take` * `BTreeSet::replace` * `BTreeSet::get` * `HashSet::take` * `HashSet::replace` * `HashSet::get` * `OsString::with_capacity` * `OsString::clear` * `OsString::capacity` * `OsString::reserve` * `OsString::reserve_exact` * `OsStr::is_empty` * `OsStr::len` * `std::os::unix::thread` * `RawPthread` * `JoinHandleExt` * `JoinHandleExt::as_pthread_t` * `JoinHandleExt::into_pthread_t` * `HashSet::hasher` * `HashMap::hasher` * `CommandExt::exec` * `File::try_clone` * `SocketAddr::set_ip` * `SocketAddr::set_port` * `SocketAddrV4::set_ip` * `SocketAddrV4::set_port` * `SocketAddrV6::set_ip` * `SocketAddrV6::set_port` * `SocketAddrV6::set_flowinfo` * `SocketAddrV6::set_scope_id` * `<[T]>::copy_from_slice` * `ptr::read_volatile` * `ptr::write_volatile` * The `#[deprecated]` attribute * `OpenOptions::create_new` Deprecated * `std::raw::Slice` - use raw parts of `slice` module instead * `std::raw::Repr` - use raw parts of `slice` module instead * `str::char_range_at` - use slicing plus `chars()` plus `len_utf8` * `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8` * `str::char_at` - use slicing plus `chars()` * `str::char_at_reverse` - use slicing plus `chars().rev()` * `str::slice_shift_char` - use `chars()` plus `Chars::as_str` * `CommandExt::session_leader` - use `before_exec` instead. Closes rust-lang#27719 cc rust-lang#27751 (deprecating the `Slice` bits) Closes rust-lang#27754 Closes rust-lang#27780 Closes rust-lang#27809 Closes rust-lang#27811 Closes rust-lang#27830 Closes rust-lang#28050 Closes rust-lang#29453 Closes rust-lang#29791 Closes rust-lang#29935 Closes rust-lang#30014 Closes rust-lang#30752 Closes rust-lang#31262 cc rust-lang#31398 (still need to deal with `before_exec`) Closes rust-lang#31405 Closes rust-lang#31572 Closes rust-lang#31755 Closes rust-lang#31756
std: Stabilize APIs for the 1.9 release This commit applies all stabilizations, renamings, and deprecations that the library team has decided on for the upcoming 1.9 release. All tracking issues have gone through a cycle-long "final comment period" and the specific APIs stabilized/deprecated are: Stable * `std::panic` * `std::panic::catch_unwind` (renamed from `recover`) * `std::panic::resume_unwind` (renamed from `propagate`) * `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`) * `std::panic::UnwindSafe` (renamed from `RecoverSafe`) * `str::is_char_boundary` * `<*const T>::as_ref` * `<*mut T>::as_ref` * `<*mut T>::as_mut` * `AsciiExt::make_ascii_uppercase` * `AsciiExt::make_ascii_lowercase` * `char::decode_utf16` * `char::DecodeUtf16` * `char::DecodeUtf16Error` * `char::DecodeUtf16Error::unpaired_surrogate` * `BTreeSet::take` * `BTreeSet::replace` * `BTreeSet::get` * `HashSet::take` * `HashSet::replace` * `HashSet::get` * `OsString::with_capacity` * `OsString::clear` * `OsString::capacity` * `OsString::reserve` * `OsString::reserve_exact` * `OsStr::is_empty` * `OsStr::len` * `std::os::unix::thread` * `RawPthread` * `JoinHandleExt` * `JoinHandleExt::as_pthread_t` * `JoinHandleExt::into_pthread_t` * `HashSet::hasher` * `HashMap::hasher` * `CommandExt::exec` * `File::try_clone` * `SocketAddr::set_ip` * `SocketAddr::set_port` * `SocketAddrV4::set_ip` * `SocketAddrV4::set_port` * `SocketAddrV6::set_ip` * `SocketAddrV6::set_port` * `SocketAddrV6::set_flowinfo` * `SocketAddrV6::set_scope_id` * `<[T]>::copy_from_slice` * `ptr::read_volatile` * `ptr::write_volatile` * The `#[deprecated]` attribute * `OpenOptions::create_new` Deprecated * `std::raw::Slice` - use raw parts of `slice` module instead * `std::raw::Repr` - use raw parts of `slice` module instead * `str::char_range_at` - use slicing plus `chars()` plus `len_utf8` * `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8` * `str::char_at` - use slicing plus `chars()` * `str::char_at_reverse` - use slicing plus `chars().rev()` * `str::slice_shift_char` - use `chars()` plus `Chars::as_str` * `CommandExt::session_leader` - use `before_exec` instead. Closes #27719 cc #27751 (deprecating the `Slice` bits) Closes #27754 Closes #27780 Closes #27809 Closes #27811 Closes #27830 Closes #28050 Closes #29453 Closes #29791 Closes #29935 Closes #30014 Closes #30752 Closes #31262 cc #31398 (still need to deal with `before_exec`) Closes #31405 Closes #31572 Closes #31755 Closes #31756
This commit applies all stabilizations, renamings, and deprecations that the library team has decided on for the upcoming 1.9 release. All tracking issues have gone through a cycle-long "final comment period" and the specific APIs stabilized/deprecated are: Stable * `std::panic` * `std::panic::catch_unwind` (renamed from `recover`) * `std::panic::resume_unwind` (renamed from `propagate`) * `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`) * `std::panic::UnwindSafe` (renamed from `RecoverSafe`) * `str::is_char_boundary` * `<*const T>::as_ref` * `<*mut T>::as_ref` * `<*mut T>::as_mut` * `AsciiExt::make_ascii_uppercase` * `AsciiExt::make_ascii_lowercase` * `char::decode_utf16` * `char::DecodeUtf16` * `char::DecodeUtf16Error` * `char::DecodeUtf16Error::unpaired_surrogate` * `BTreeSet::take` * `BTreeSet::replace` * `BTreeSet::get` * `HashSet::take` * `HashSet::replace` * `HashSet::get` * `OsString::with_capacity` * `OsString::clear` * `OsString::capacity` * `OsString::reserve` * `OsString::reserve_exact` * `OsStr::is_empty` * `OsStr::len` * `std::os::unix::thread` * `RawPthread` * `JoinHandleExt` * `JoinHandleExt::as_pthread_t` * `JoinHandleExt::into_pthread_t` * `HashSet::hasher` * `HashMap::hasher` * `CommandExt::exec` * `File::try_clone` * `SocketAddr::set_ip` * `SocketAddr::set_port` * `SocketAddrV4::set_ip` * `SocketAddrV4::set_port` * `SocketAddrV6::set_ip` * `SocketAddrV6::set_port` * `SocketAddrV6::set_flowinfo` * `SocketAddrV6::set_scope_id` * `<[T]>::copy_from_slice` * `ptr::read_volatile` * `ptr::write_volatile` * The `#[deprecated]` attribute * `OpenOptions::create_new` Deprecated * `std::raw::Slice` - use raw parts of `slice` module instead * `std::raw::Repr` - use raw parts of `slice` module instead * `str::char_range_at` - use slicing plus `chars()` plus `len_utf8` * `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8` * `str::char_at` - use slicing plus `chars()` * `str::char_at_reverse` - use slicing plus `chars().rev()` * `str::slice_shift_char` - use `chars()` plus `Chars::as_str` * `CommandExt::session_leader` - use `before_exec` instead. Closes rust-lang#27719 cc rust-lang#27751 (deprecating the `Slice` bits) Closes rust-lang#27754 Closes rust-lang#27780 Closes rust-lang#27809 Closes rust-lang#27811 Closes rust-lang#27830 Closes rust-lang#28050 Closes rust-lang#29453 Closes rust-lang#29791 Closes rust-lang#29935 Closes rust-lang#30014 Closes rust-lang#30752 Closes rust-lang#31262 cc rust-lang#31398 (still need to deal with `before_exec`) Closes rust-lang#31405 Closes rust-lang#31572 Closes rust-lang#31755 Closes rust-lang#31756
Is there any reason that these items are defined in I was investigating #49319 and ended up here. |
I’ll respond in #49319 since this thread has been closed for two years :) |
#27808 proposes exposing in
std::char
two iterator adaptorsUtf16Decoder
andUtf16LossyDecoder
. This functionality was previously only available with an API that require allocation (String::from_utf16{,_lossy}
) or using the unstablerustc_unicode
crate directly.They are exposed unstable with a new
utf16_decoder
feature name. I’d like to stabilize them when we’re confident with the naming and API.The text was updated successfully, but these errors were encountered: