Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target cpu option is passed only to the final compiler target, not to its deps #355

Closed
zheland opened this issue Dec 23, 2024 · 0 comments · Fixed by #356
Closed

Target cpu option is passed only to the final compiler target, not to its deps #355

zheland opened this issue Dec 23, 2024 · 0 comments · Fixed by #356

Comments

@zheland
Copy link
Contributor

zheland commented Dec 23, 2024

Summary

Currently target-cpu is passed using rustc args. However rustc args are passed only to the final compiler target, not to its deps. This causes that when --native or --target-cpu is specified, only the last target is compiled with optimization for the current or specified processor. If cargo asm is called for test, example or bench than the library or binary itself is compiled without target-cpu as well.

This may not always be the case for inline functions. But for example library items with #[cfg(target_feature = "sse3")] won't be compiled when calling cargo asm --native for its test, example or bench on sse3-compatible processor.

References

From cargo rustc description at https://doc.rust-lang.org/cargo/commands/cargo-rustc.html:

cargo rustc [options] [-- args]
The specified args will all be passed to the final compiler invocation, not any of the dependencies.

From https://doc.rust-lang.org/cargo/reference/environment-variables.html

RUSTFLAGS — A space-separated list of custom flags to pass to all compiler invocations that Cargo performs. In contrast with cargo rustc, this is useful for passing a flag to all compiler instances.

From cargo asm --help

...
Available options:
...
        --native              Optimize for the CPU running the compiler
        --target-cpu=CPU      Optimize code for a specific CPU, see 'rustc --print target-cpus'
...

Example with cargo rustc -v

Pass target-cpu using rustc args and pass target-cpu using RUSTFLAGS:

(
    cargo clean
    first="$(
        cargo rustc -v --release --color always \
        --example example -- -C target-cpu=native --emit asm 2>&1
    )"

    cargo clean
    second="$(
        RUSTFLAGS='-C target-cpu=native' \
        cargo rustc -v --release --color always \
        --example example -- --emit asm 2>&1
    )"

    git diff --no-index --word-diff=porcelain --word-diff-regex=. --color=always \
        <( echo "$first" ) <( echo "$second" ) | cat
)

The first invocation compiles dependency.
The second invocation compiles library.
The final invocation compiles library example.

diff --git a/dev/fd/63 b/dev/fd/62
--- a/dev/fd/63
+++ b/dev/fd/62
@@ -1,6 +1,6 @@
    Compiling bytemuck v1.21.0
~
      Running `/home/zheland/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name bytemuck --edition=2018 /home/zheland/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.21.0/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --deny=unexpected_cfgs --check-cfg 'cfg(target_arch, values("spirv"))' --check-cfg 'cfg(docsrs)' --check-cfg 'cfg(feature, values("aarch64_simd", "align_offset", "alloc_uninit", "avx512_simd", "bytemuck_derive", "const_zeroed", "derive", "extern_crate_alloc", "extern_crate_std", "latest_stable_rust", "min_const_generics", "must_cast", "must_cast_extra", "nightly_docs", "nightly_float", "nightly_portable_simd", "nightly_stdsimd", "track_caller", "transparentwrapper_extra", "unsound_ptr_pod_impl", "wasm_simd", "zeroable_atomics", "zeroable_maybe_uninit"))' -C metadata=a6e55f6ecfe0cb44 -C extra-filename=-a6e55f6ecfe0cb44 --out-dir /home/zheland/dev/test/test-cfg-feature/target/release/deps -C strip=debuginfo -L dependency=/home/zheland/dev/test/test-cfg-feature/target/release/deps --cap-lints allow
+ -C target-cpu=native
 `
~
    Compiling example v0.1.0 (/home/zheland/dev/test/test-cfg-feature)
~
      Running `/home/zheland/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name example --edition=2021 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --check-cfg 'cfg(docsrs)' --check-cfg 'cfg(feature, values())' -C metadata=e36f30cdf84b16ad -C extra-filename=-e36f30cdf84b16ad --out-dir /home/zheland/dev/test/test-cfg-feature/target/release/deps -C strip=debuginfo -L dependency=/home/zheland/dev/test/test-cfg-feature/target/release/deps --extern bytemuck=/home/zheland/dev/test/test-cfg-feature/target/release/deps/libbytemuck-a6e55f6ecfe0cb44.rmeta
+ -C target-cpu=native
 `
~
      Running `/home/zheland/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name example --edition=2021 examples/example.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C embed-bitcode=no -
-C target-cpu=native -
 -emit asm --check-cfg 'cfg(docsrs)' --check-cfg 'cfg(feature, values())' -C metadata=3dff7f25c1298419 -C extra-filename=-3dff7f25c1298419 --out-dir /home/zheland/dev/test/test-cfg-feature/target/release/examples -C strip=debuginfo -L dependency=/home/zheland/dev/test/test-cfg-feature/target/release/deps --extern bytemuck=/home/zheland/dev/test/test-cfg-feature/target/release/deps/libbytemuck-a6e55f6ecfe0cb44.rlib --extern example=/home/zheland/dev/test/test-cfg-feature/target/release/deps/libexample-e36f30cdf84b16ad.rlib
+ -C target-cpu=native
 `
~
     Finished `release` profile [optimized] target(s) in 0.17s
~

Example with cargo asm --native

src/lib.rs:

#[no_mangle]
pub fn ffi_has_sse3_than_2_else_1() -> u32 {
    has_sse3_than_2_else_1()
}

#[cfg(target_feature = "sse3")]
#[inline]
pub fn has_sse3_than_2_else_1() -> u32 {
    2 // Different numbers are used to avoid function merging
}

#[cfg(not(target_feature = "sse3"))]
#[inline]
pub fn has_sse3_than_2_else_1() -> u32 {
    1
}

tests/test.rs:

#[no_mangle]
pub fn ffi_lib_has_sse3_then_2_else_1() -> u32 {
    library::has_sse3_than_2_else_1()
}

#[no_mangle]
pub fn ffi_test_has_sse3_then_4_else_3() -> u32 {
    test_has_sse3_than_4_else_3()
}

#[cfg(target_feature = "sse3")]
#[inline]
pub fn test_has_sse3_than_4_else_3() -> u32 {
    4
}

#[cfg(not(target_feature = "sse3"))]
#[inline]
pub fn test_has_sse3_than_4_else_3() -> u32 {
    3
}
$ cargo asm --native --lib ffi_has_sse3_than_2_else_1
	mov eax, 2 # lib compiled with target-cpu=native
	ret
$ cargo asm --native --test test ffi_lib_has_sse3_then_2_else_1
	mov eax, 1 # lib compiled without target-cpu=native
	ret
$ cargo asm --native --test test ffi_test_has_sse3_then_4_else_3
	mov eax, 4 # test compiled with target-cpu=native
	ret

Possible solutions

  1. Just mention in the help that native and target-cpu options may have no effect on dependencies or even crate library if called for its example or bench or test. But I'd say it's kind of expected that the --native and --target-cpu flags should affect dependencies as well.

  2. Pass target-cpu using RUSTFLAGS.

Workaround

Don't use cargo asm --native or cargo asm --target-cpu=CPU, use RUSTFLAGS='-C target-cpu={CPU|native}' cargo asm instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant