Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_mut_vec_for_path_buf in windows breaks UTF-8 is_known_utf8 assumption #126291

Closed
yhx-12243 opened this issue Jun 12, 2024 · 1 comment · Fixed by #126305
Closed

as_mut_vec_for_path_buf in windows breaks UTF-8 is_known_utf8 assumption #126291

yhx-12243 opened this issue Jun 12, 2024 · 1 comment · Fixed by #126305
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@yhx-12243
Copy link

yhx-12243 commented Jun 12, 2024

pub struct Wtf8Buf {
bytes: Vec<u8>,
/// Do we know that `bytes` holds a valid UTF-8 encoding? We can easily
/// know this if we're constructed from a `String` or `&str`.
///
/// It is possible for `bytes` to have valid UTF-8 without this being
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become
/// paired, as we don't bother to rescan the entire string.
is_known_utf8: bool,
}

pub(crate) fn as_mut_vec_for_path_buf(&mut self) -> &mut Vec<u8> {
&mut self.bytes
}

I tried this code:

use std::{ffi::OsString, os::windows::ffi::OsStringExt, path::PathBuf};

fn f() -> Result<String, OsString> {
    let mut utf8 = PathBuf::from(OsString::from("utf8".to_owned()));
    let non_utf8: OsString = OsStringExt::from_wide(&[0x6e, 0x6f, 0x6e, 0xd800, 0x75, 0x74, 0x66, 0x38]);
    utf8.set_extension(&non_utf8);
    utf8.into_os_string().into_string()
}

fn main() {
    dbg!(f());
}

I expected to see this happen:

[1.rs:11:5] f() = Err(
    "utf8.non\xED\xA0\x80utf8",
)

Instead, this happened:

[1.rs:11:5] f() = Ok(
    "utf8.non\u{d800}utf8",
)

(Obviously, Strings can't contain \u{d800}.)

1

Meta

rustc --version --verbose:

rustc 1.81.0-nightly (d0227c6a1 2024-06-11)
binary: rustc
commit-hash: d0227c6a19c2d6e8dceb87c7a2776dc2b10d2a04
commit-date: 2024-06-11
host: x86_64-pc-windows-gnu
release: 1.81.0-nightly
LLVM version: 18.1.7
@yhx-12243 yhx-12243 added the C-bug Category: This is a bug. label Jun 12, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 12, 2024
@yhx-12243 yhx-12243 changed the title as_mut_vec_for_path_buf in windows breaks UTF-8 is_known_utf8 assumption as_mut_vec_for_path_buf in windows breaks UTF-8 is_known_utf8 assumption Jun 12, 2024
@workingjubilee workingjubilee added O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 12, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 12, 2024
@workingjubilee
Copy link
Member

Nice catch!

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 12, 2024
…ng-utf8-invariant, r=<try>

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 12, 2024
@bors bors closed this as completed in 3862f01 Jun 12, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jun 12, 2024
Rollup merge of rust-lang#126305 - workingjubilee:fix-os-string-to-string-utf8-invariant, r=joboet

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants