-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emscripten metadata file size is wrong #131467
Comments
uhh does this reproduce on the latest nightly? |
Yes, unfortunately it still reproduces on My current workaround is the following: - let bytes = std::fs::read(path)?;
+ struct FileNoSize(std::fs::File);
+
+ impl std::io::Read for FileNoSize {
+ fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
+ self.0.read(buf)
+ }
+ }
+
+ let mut file = FileNoSize(std::fs::File::open(path)?);
+
+ let mut bytes = Vec::new();
+ std::io::Read::read_to_end(&mut file, &mut bytes)?; |
It seems we just call fstat64? It's not clear how that's not working out for us. |
We could have an incorrect definition of the Emscripten stat64 structure? |
emscripten's definition of |
Pyodide v0.25 uses Emscripten 3.1.46, so that would be after the change. The fix to libc was in rust-lang/libc@63b0d67 and released in v0.2.148, which Rust is already using. Huh. |
When building with a fixed version of |
Do you think that using |
Sure, give it a try! |
Yes, How could this footgun be avoided? |
Not sure; maybe the emscripten version used in |
Perhaps we could add some warning to the target list about this issue? Could someone point me in the direction of which file to edit? Cc @hoodmane: Pyodide packages using Rust should be built with build-std to ensure that they use the correct Emscripten ABI, otherwise e.g. file size metadata is garbage |
@juntyr Here, you would need to add an entry because the emscripten target doesn't have a doc for it yet, but you can add appropriate caveats to this page and in this folder: |
@kleisauke It seems you needed to also update https://github.com/rust-lang/rust/blob/master/src/ci/docker/scripts/emscripten.sh to finish the patch |
Ah, I think this is also the reason why #116655 failed without I just opened (draft) PR rust-lang/libc#3962 for this in libc to bump emsdk and make the changes in rust-lang/libc@63b0d67 unconditional. |
Thanks @juntyr for reporting this! I will add a test for this to Pyodide.
Yeah does sound like we should add Seems like I got rid of It would be really nice if Emscripten could adopt an abi versioning scheme so that other projects could know what is going on. |
Otherwise we are linking a copy of the standard library built with some particular version of emscripten selected by the rust compiler that does not match the version we are using. This can lead to an ABI mismatch. See rust-lang/rust#131467
By the way, is it possible for us to build the rust standard library once against our emscripten compiler and point all packages we build to this one copy of the rust stdlib? That way we would only have to rebuild when we change the version of rust or emscripten as opposed to every time we build any rust package. It would save a lot of CI time. |
This tests the fix for rust-lang/rust#131467 in pyodide/pyodide-build#40. The problem is that rust cannot correctly read file metadata because the ABI of fstat64 changed between the version of emscripten that the rust compiler built the standard library against and our version of the rust compiler. By passing `-Zbuild-std` we rebuild the rust standard library with the correct abi. I also added a test that works only if we pass `-Zbuild-std`.
When building the Rust crates, you could pass I'm not sure how to prepare the sysroot itself, but there is a crate https://crates.io/crates/cargo-sysroot that supports building with custom sysroots, so it might be useful to look at. |
…ersion, r=Kobzol emscripten: Use the latest emsdk 3.1.68 This should fix our precompiled std being unsound in `std::fs` code. Should resolve rust-lang#131467 I hope.
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-high |
Rollup merge of rust-lang#131533 - workingjubilee:update-emscripten-version, r=Kobzol emscripten: Use the latest emsdk 3.1.68 This should fix our precompiled std being unsound in `std::fs` code. Should resolve rust-lang#131467 I hope.
…, r=jieyouxu Add wasm32-unknown-emscripten platform support document This PR adds the platform support document for wasm32-unknown-emscripten, and adds a warning about breaks in Emscripten ABI compatibility (see rust-lang#131467). I mostly based the document off the wasm32-unknown-unknown docs and some of the information may still be missing (e.g. who's the target maintainer) or outdated (e.g. the build requirements). I still hope that it provides a good starting point. r? `@workingjubilee`
Rollup merge of rust-lang#131582 - juntyr:emscripten-platform-support, r=jieyouxu Add wasm32-unknown-emscripten platform support document This PR adds the platform support document for wasm32-unknown-emscripten, and adds a warning about breaks in Emscripten ABI compatibility (see rust-lang#131467). I mostly based the document off the wasm32-unknown-unknown docs and some of the information may still be missing (e.g. who's the target maintainer) or outdated (e.g. the build requirements). I still hope that it provides a good starting point. r? `@workingjubilee`
PR #131533 unfortunately didn't resolve the issue. https://github.com/rust-lang/rust/blob/master/src/ci/docker/scripts/emscripten.sh looks like a remnant of the now-removed We may need to either install emsdk in this Dockerfile: Or alternatively, always enforce the Standalone reproducer: $ curl https://sh.rustup.rs -sSf | sh -s -- -y --profile minimal --target wasm32-unknown-emscripten --default-toolchain nightly-2024-12-09 --component rust-src
$ source "$HOME/.cargo/env"
$ cargo new foo
$ cd foo
$ cat <<EOT > src/main.rs
use std::io;
use std::fs;
fn main() -> io::Result<()> {
let metadata = fs::metadata("hello.txt")?;
println!("{}", metadata.len());
Ok(())
}
EOT
$ RUSTFLAGS="-Clink-arg=-sNODERAWFS" cargo build --release -Zbuild-std=panic_abort,std --target wasm32-unknown-emscripten
$ echo "Hello, world!" > hello.txt
$ node target/wasm32-unknown-emscripten/release/foo.js
14
$ RUSTFLAGS="-Clink-arg=-sNODERAWFS" cargo build --release --target wasm32-unknown-emscripten
$ node target/wasm32-unknown-emscripten/release/foo.js
1733771096 |
I tried this code:
for a file that is <1MB large.
I expected to see this happen: The read should allocate a vec that is as large as the file content. The heap size in Emscripten should not meaningfully change.
Instead, this happened: The Emscripten heap grows by 1.8GB
I looked into the implementation of fs::read. Since it uses the file metadata size to pre-allocate the output vector, I debug printed the file size, which is around 1.8GB.
Since my Rust code is running inside Pyodide 0.25, I also checked Python's os.stat of the file, which reports the correct file size. Reading the file in Python also succeeds without the enormous over-allocation. It seems that this is a Rust-specific bug in getting the correct file size information from Emscripten.
Meta
My Rust toolchain is pinned to nightly-2024-07-21, the last nightly for 1.81.
The text was updated successfully, but these errors were encountered: