Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten metadata file size is wrong #131467

Closed
juntyr opened this issue Oct 9, 2024 · 20 comments · Fixed by #131533
Closed

Emscripten metadata file size is wrong #131467

juntyr opened this issue Oct 9, 2024 · 20 comments · Fixed by #131533
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-emscripten Target: 50% off wasm32-unknown-musl. the savings come out of stdio.h, but hey, you get SDL! O-wasi Operating system: Wasi, Webassembly System Interface P-high High priority T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@juntyr
Copy link
Contributor

juntyr commented Oct 9, 2024

I tried this code:

std::fs::read("my-file")

for a file that is <1MB large.

I expected to see this happen: The read should allocate a vec that is as large as the file content. The heap size in Emscripten should not meaningfully change.

Instead, this happened: The Emscripten heap grows by 1.8GB

I looked into the implementation of fs::read. Since it uses the file metadata size to pre-allocate the output vector, I debug printed the file size, which is around 1.8GB.

Since my Rust code is running inside Pyodide 0.25, I also checked Python's os.stat of the file, which reports the correct file size. Reading the file in Python also succeeds without the enormous over-allocation. It seems that this is a Rust-specific bug in getting the correct file size information from Emscripten.

Meta

My Rust toolchain is pinned to nightly-2024-07-21, the last nightly for 1.81.

@juntyr juntyr added the C-bug Category: This is a bug. label Oct 9, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 9, 2024
@workingjubilee
Copy link
Member

uhh does this reproduce on the latest nightly?

@jieyouxu jieyouxu added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-libs Relevant to the library team, which will review and decide on the PR/issue. O-wasi Operating system: Wasi, Webassembly System Interface and removed O-wasm Target: WASM (WebAssembly), http://webassembly.org/ labels Oct 10, 2024
@juntyr
Copy link
Contributor Author

juntyr commented Oct 10, 2024

uhh does this reproduce on the latest nightly?

Yes, unfortunately it still reproduces on rustc 1.83.0-nightly (eb4e23467 2024-10-09).

My current workaround is the following:

- let bytes = std::fs::read(path)?;
+ struct FileNoSize(std::fs::File);
+
+ impl std::io::Read for FileNoSize {
+     fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
+         self.0.read(buf)
+     }
+ }
+
+ let mut file = FileNoSize(std::fs::File::open(path)?);
+
+ let mut bytes = Vec::new();
+ std::io::Read::read_to_end(&mut file, &mut bytes)?;

@workingjubilee
Copy link
Member

It seems we just call fstat64? It's not clear how that's not working out for us.

@workingjubilee
Copy link
Member

We could have an incorrect definition of the Emscripten stat64 structure?

@kadiwa4
Copy link
Contributor

kadiwa4 commented Oct 10, 2024

emscripten's definition of stat64 changed in version 3.1.42, which is accounted for in the libc crate. Maybe that can lead to version-mismatch problems? Which version are you using? From this file, it seems like the Rust standard library still targets 2.0.5.

@juntyr
Copy link
Contributor Author

juntyr commented Oct 10, 2024

Pyodide v0.25 uses Emscripten 3.1.46, so that would be after the change. The fix to libc was in rust-lang/libc@63b0d67 and released in v0.2.148, which Rust is already using. Huh.

@kadiwa4
Copy link
Contributor

kadiwa4 commented Oct 10, 2024

When building with a fixed version of libc, it does a check in the build script to see if the installed version of emscripten uses the old or new definition. I assume you downloaded the emscripten target from rustup, which would mean that you're using a precompiled standard library. So the question is really which version of emscripten is installed on the builders that built the standard library you are using.

@juntyr
Copy link
Contributor Author

juntyr commented Oct 10, 2024

Do you think that using -Zbuild-std would help here?

@kadiwa4
Copy link
Contributor

kadiwa4 commented Oct 10, 2024

Sure, give it a try!

@juntyr
Copy link
Contributor Author

juntyr commented Oct 10, 2024

Yes, -Zbuild-std works both on the latest nightly and my pin.

How could this footgun be avoided?

@kadiwa4
Copy link
Contributor

kadiwa4 commented Oct 10, 2024

Not sure; maybe the emscripten version used in src/ci/docker/scripts/emscripten.sh could be updated and then a minimum version requirement for the emscripten toolchain could be added.
Note that I'm not a target maintainer or anything, just a fellow contributor :)

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 11, 2024
@juntyr
Copy link
Contributor Author

juntyr commented Oct 11, 2024

Perhaps we could add some warning to the target list about this issue? Could someone point me in the direction of which file to edit?

Cc @hoodmane: Pyodide packages using Rust should be built with build-std to ensure that they use the correct Emscripten ABI, otherwise e.g. file size metadata is garbage

@workingjubilee
Copy link
Member

@juntyr Here, you would need to add an entry because the emscripten target doesn't have a doc for it yet, but you can add appropriate caveats to this page and in this folder:

@workingjubilee workingjubilee added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Oct 11, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 11, 2024
@workingjubilee workingjubilee added the O-emscripten Target: 50% off wasm32-unknown-musl. the savings come out of stdio.h, but hey, you get SDL! label Oct 11, 2024
@workingjubilee
Copy link
Member

@kleisauke It seems you needed to also update https://github.com/rust-lang/rust/blob/master/src/ci/docker/scripts/emscripten.sh to finish the patch

@kleisauke
Copy link
Contributor

Ah, I think this is also the reason why #116655 failed without -Zbuild-std.

I just opened (draft) PR rust-lang/libc#3962 for this in libc to bump emsdk and make the changes in rust-lang/libc@63b0d67 unconditional.

@hoodmane
Copy link
Contributor

Thanks @juntyr for reporting this! I will add a test for this to Pyodide.

Pyodide packages using Rust should be built with build-std to ensure that they use the correct Emscripten ABI, otherwise e.g. file size metadata is garbage

Yeah does sound like we should add -Zbuild-std back in to the rust flags. It's definitely not safe to build against one version of Emscripten and then link against a different one. I thought we'd been passing it all along, but looking through Pyodide's git history I got rid of it almost immediately after adding rust support. We should probably encourage everyone targeting Emscripten to pass -Zbuild-std. Not doing that requires ABI stability.

Seems like I got rid of -Zbuild-std because I was hoping to get off of nightly, but we're stuck on nightly anyways because we have to pass -Z link-native-libraries=no because Emscripten fails if you pass -lc when linking a shared library.
emscripten-core/emscripten#16680 (comment)

It would be really nice if Emscripten could adopt an abi versioning scheme so that other projects could know what is going on.
@sbc100
emscripten-core/emscripten#15917

hoodmane added a commit to hoodmane/pyodide-build that referenced this issue Oct 11, 2024
Otherwise we are linking a copy of the standard library built with some
particular version of emscripten selected by the rust compiler that does not
match the version we are using. This can lead to an ABI mismatch.

See rust-lang/rust#131467
@hoodmane
Copy link
Contributor

By the way, is it possible for us to build the rust standard library once against our emscripten compiler and point all packages we build to this one copy of the rust stdlib? That way we would only have to rebuild when we change the version of rust or emscripten as opposed to every time we build any rust package. It would save a lot of CI time.

hoodmane added a commit to hoodmane/pyodide that referenced this issue Oct 11, 2024
This tests the fix for rust-lang/rust#131467 in pyodide/pyodide-build#40.
The problem is that rust cannot correctly read file metadata because the ABI of
fstat64 changed between the version of emscripten that the rust compiler built
the standard library against and our version of the rust compiler.

By passing `-Zbuild-std` we rebuild the rust standard library with the correct
abi. I also added a test that works only if we pass `-Zbuild-std`.
@juntyr
Copy link
Contributor Author

juntyr commented Oct 11, 2024

By the way, is it possible for us to build the rust standard library once against our emscripten compiler and point all packages we build to this one copy of the rust stdlib? That way we would only have to rebuild when we change the version of rust or emscripten as opposed to every time we build any rust package. It would save a lot of CI time.

When building the Rust crates, you could pass --sysroot <PATH> to a prebuilt sysroot. To each crate, it would look again just normal (no build-std anymore).

I'm not sure how to prepare the sysroot itself, but there is a crate https://crates.io/crates/cargo-sysroot that supports building with custom sysroots, so it might be useful to look at.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 13, 2024
…ersion, r=Kobzol

emscripten: Use the latest emsdk 3.1.68

This should fix our precompiled std being unsound in `std::fs` code.

Should resolve rust-lang#131467 I hope.
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 14, 2024
@bors bors closed this as completed in eb060f6 Oct 14, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 14, 2024
Rollup merge of rust-lang#131533 - workingjubilee:update-emscripten-version, r=Kobzol

emscripten: Use the latest emsdk 3.1.68

This should fix our precompiled std being unsound in `std::fs` code.

Should resolve rust-lang#131467 I hope.
Urgau added a commit to Urgau/rust that referenced this issue Oct 16, 2024
…, r=jieyouxu

Add wasm32-unknown-emscripten platform support document

This PR adds the platform support document for wasm32-unknown-emscripten, and adds a warning about breaks in Emscripten ABI compatibility (see rust-lang#131467).

I mostly based the document off the wasm32-unknown-unknown docs and some of the information may still be missing (e.g. who's the target maintainer) or outdated (e.g. the build requirements). I still hope that it provides a good starting point.

r? `@workingjubilee`
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 16, 2024
Rollup merge of rust-lang#131582 - juntyr:emscripten-platform-support, r=jieyouxu

Add wasm32-unknown-emscripten platform support document

This PR adds the platform support document for wasm32-unknown-emscripten, and adds a warning about breaks in Emscripten ABI compatibility (see rust-lang#131467).

I mostly based the document off the wasm32-unknown-unknown docs and some of the information may still be missing (e.g. who's the target maintainer) or outdated (e.g. the build requirements). I still hope that it provides a good starting point.

r? `@workingjubilee`
@kleisauke
Copy link
Contributor

PR #131533 unfortunately didn't resolve the issue. https://github.com/rust-lang/rust/blob/master/src/ci/docker/scripts/emscripten.sh looks like a remnant of the now-removed asmjs-unknown-emscripten target (see PR #117338). As a result, the emscripten_new_stat_abi cfg added in rust-lang/libc@63b0d67 was not enabled when the prebuilt std was built.

We may need to either install emsdk in this Dockerfile:
https://github.com/rust-lang/rust/blob/master/src/ci/docker/host-x86_64/dist-various-1/Dockerfile

Or alternatively, always enforce the emscripten_new_stat_abi cfg in libc and require Emscripten 3.1.42 or later in Rust, see:
rust-lang/libc#3962 (comment)

Standalone reproducer:

$ curl https://sh.rustup.rs -sSf | sh -s -- -y --profile minimal --target wasm32-unknown-emscripten --default-toolchain nightly-2024-12-09 --component rust-src
$ source "$HOME/.cargo/env"
$ cargo new foo
$ cd foo
$ cat <<EOT > src/main.rs
use std::io;
use std::fs;

fn main() -> io::Result<()> {
    let metadata = fs::metadata("hello.txt")?;
    println!("{}", metadata.len());
    Ok(())
}
EOT
$ RUSTFLAGS="-Clink-arg=-sNODERAWFS" cargo build --release -Zbuild-std=panic_abort,std --target wasm32-unknown-emscripten
$ echo "Hello, world!" > hello.txt
$ node target/wasm32-unknown-emscripten/release/foo.js
14
$ RUSTFLAGS="-Clink-arg=-sNODERAWFS" cargo build --release --target wasm32-unknown-emscripten
$ node target/wasm32-unknown-emscripten/release/foo.js
1733771096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-emscripten Target: 50% off wasm32-unknown-musl. the savings come out of stdio.h, but hey, you get SDL! O-wasi Operating system: Wasi, Webassembly System Interface P-high High priority T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants