Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVPTX target specification #57937

Merged
merged 8 commits into from
Feb 2, 2019
Merged

NVPTX target specification #57937

merged 8 commits into from
Feb 2, 2019

Conversation

denzp
Copy link
Contributor

@denzp denzp commented Jan 27, 2019

This change adds a built-in nvptx64-nvidia-cuda GPGPU no-std target specification and a basic PTX assembly smoke tests.

The approach is taken here and the target spec is based on ptx-linker, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.

Prior to this change, the linker used a ld linker-flavor, but I think, having the special CLI convention is a more reliable way.

Questions about further progress on reliable CUDA workflow with Rust:

  1. Is it possible to create a test suite codegen-asm to verify end-to-end integration with LLVM backend?
  2. How would it be better to organise no-std compile-fail tests: add #![no_std] where possible and mark others as ignore-nvptx directive, or alternatively, introduce compile-fail-no-std test suite?
  3. Can we have the ptx-linker eventually be integrated as rls or clippy? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the current hacky solution.
  4. Am I missing some methods from rustc_codegen_ssa::back::linker::Linker that can be useful for bitcode-only linking?

Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.

Related to #38789
Fixes #38787
Fixes #38786

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cramertj (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 27, 2019
@Centril
Copy link
Contributor

Centril commented Jan 27, 2019

r? @nagisa cc @rkruppe cc @alexcrichton

@rust-highfive rust-highfive assigned nagisa and unassigned cramertj Jan 27, 2019
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:2f5eff36:start=1548620577801138403,finish=1548620580132025617,duration=2330887214
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---

[00:03:30] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:30] tidy error: /checkout/src/librustc_target/spec/nvptx64_nvidia_cuda.rs:27: TODO is deprecated; use FIXME
[00:03:30] tidy error: /checkout/src/librustc_target/spec/nvptx64_nvidia_cuda.rs:54: TODO is deprecated; use FIXME
[00:03:32] some tidy checks failed
[00:03:32] 
[00:03:32] 
[00:03:32] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:32] 
[00:03:32] 
[00:03:32] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:32] Build completed unsuccessfully in 0:00:48
[00:03:32] Build completed unsuccessfully in 0:00:48
[00:03:32] Makefile:68: recipe for target 'tidy' failed
[00:03:32] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0684e740
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 27 20:26:44 UTC 2019
---
travis_time:end:1612cfd0:start=1548620805407301115,finish=1548620805412457762,duration=5156647
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:02355b0b
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:0524b050
travis_time:start:0524b050
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:21722ef8
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

cpu: "sm_20".to_string(),

// TODO(denzp): create tests for the atomics.
max_atomic_width: Some(64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I did a few basic tests, and it turns out the the atomic_xadd rust-intrinsic seems to already generate correct PTX when doing atomic adds on u32 and u64. Hurray! I didn't test too many other atomic ops or sizes though.

(Sadly, atomic_xadd only supports integers and doesn't compile for f32 and f64 types.)

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now comments are as is. I find tests somewhat sketchy, but I understand that the hand may be forced due to inadequate flexibility of other test suites.

}

fn optimize(&mut self) {
self.cmd.arg(match self.sess.opts.optimize {
Copy link
Member

@nagisa nagisa Jan 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels suspect. I’m not sure the optimisation level for the leaf crate should influence the optimisation level for the whole crate graph. It feels like this should at least in some way depend on the LTO flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed args pattern from other linkers here.
At the moment the linker runs both LTO and optimisation passes over the final (complete) module when -O{1,2,3} is specified.

Does Rust runs optimisations before emitting bitcode object files? If yes, it can be avoided on linker's side then.

Copy link
Member

@nagisa nagisa Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Rust runs optimisations before emitting bitcode object files?

Yes it does.

I followed args pattern from other linkers here.

For gcc and/or clang optimization flags during linkage mean very little if anything at all AFAIK. It definitely does not invoke some sort of global program re-optimisation, which is what happens with ptx-linker, it seems.

Such global-program optimisation is what lto would usually do, which is why I suggested that perhaps that flag is what should be accounted for :)

That could be left as a future endeavour as well, but invoking “LTO” just from -Copt-level feels wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation and suggestion, I overlooked Session::lto(&self) before!

I've also changed ptx-linker to not perform final global optimisation - indeed it didn't affect much.

src/librustc_codegen_utils/symbol_names.rs Outdated Show resolved Hide resolved
src/librustc_codegen_utils/symbol_names.rs Outdated Show resolved Hide resolved
src/test/run-make/nvptx-binary-crate/Makefile Outdated Show resolved Hide resolved
@@ -149,6 +149,7 @@ pub fn linker_and_flavor(sess: &Session) -> (PathBuf, LinkerFlavor) {
LinkerFlavor::Ld => "ld",
LinkerFlavor::Msvc => "link.exe",
LinkerFlavor::Lld(_) => "lld",
LinkerFlavor::PtxLinker => "rust-ptx-linker",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… so we are depending on external project by default here. Something that users are very unlikely to have installed by default.

I wonder what the error looks like when rust-ptx-linker is not in $PATH. Perhaps it would make sense to ship rust-ptx-linker as part of rustlib like we do with e.g. lld component for some targets...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. On the other hand, to get started with CUDA development in Rust, users will follow either docs or tutorials, where it should be mentioned how to setup the environment.

The error message is, currently:

error: linker `rust-ptx-linker` not found
  |
  = note: No such file or directory (os error 2)

error: aborting due to previous error

Personally, I would love to have the linker to be shipped via rustup. But perhaps, it will be preferable to implement this as a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But perhaps, it will be preferable to implement this as a separate PR?

Sure.

@denzp
Copy link
Contributor Author

denzp commented Jan 28, 2019

@nagisa @peterhj thanks for reviewing!

I can agree about tests - normally these 3 run-make cases should be divided into many smaller ones. That's the reason why I call them "smoke tests" - they verify only basic functionality. The reason why I decided to condense the test cases is to avoid much similar boilerplate Makefile code that needs to be maintained.

How can I use codegen tests to assert assembly output? Also, how can I deal with the fact that the target is no-std and existing tests there often rely on libstd?

That's why I had an idea about introducing codegen-asm with mandatory target specification for each test case.

@nagisa
Copy link
Member

nagisa commented Jan 28, 2019

How can I use codegen tests to assert assembly output?

Currently, AFAIK there is no way. Minor changes to compiletest are necessary to make it possible (i.e. you can tell codegen test to generate assembly, but no way to tell compiletest to look at anything but .ll output). For now it is fine to keep these tests as make-tests. We will figure out how to migrate them once compiletest is improved to support such use-case.

@alexcrichton
Copy link
Member

Thanks for this!

The only comment I'd add is that we unfortunately don't have resources for another image to run on Travis right now, but could this be folded into an existing builder that's already performing well under 2h on average?

@denzp
Copy link
Contributor Author

denzp commented Jan 28, 2019

@alexcrichton I can see that wasm-unknown image is usually runs pretty fast compared to other non-dist images.

Can I add NVPTX tests there? The only thing, it feels like the image should be renamed then. Do you have suggestions about the naming?

@alexcrichton
Copy link
Member

@denzp sure yeah so long as it still runs in under 2 hrs should be fine to add! It should be fine to rename it to something like test-various perhaps?


ifeq ($(TARGET),nvptx64-nvidia-cuda)
all:
$(RUSTC) main.rs -Clink-arg=--arch=sm_60 --crate-type="bin" -O --target $(TARGET)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does RUSTFLAGS=-C target-cpu=sm_60 achieve the same effect here as passing the link flags ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a nice catch.
I've addressed this providing target_cpu via --fallback-arch arg to the linker in the latest commit.

@denzp
Copy link
Contributor Author

denzp commented Jan 29, 2019

@alexcrichton thanks, I've merged the images as suggested!

@alexcrichton
Copy link
Member

👍

@nagisa
Copy link
Member

nagisa commented Feb 1, 2019

The compiler bits look good to me. I’ll assume that Alex’s 👍 is an approval for the infra changes.

@bors r+

@bors
Copy link
Contributor

bors commented Feb 1, 2019

📌 Commit 49931fd has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 1, 2019
@bors
Copy link
Contributor

bors commented Feb 1, 2019

⌛ Testing commit 49931fd with merge 2efa31b...

bors added a commit that referenced this pull request Feb 1, 2019
NVPTX target specification

This change adds a built-in `nvptx64-nvidia-cuda` GPGPU no-std target specification and a basic PTX assembly smoke tests.

The approach is taken here and the target spec is based on `ptx-linker`, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.

Prior to this change, the linker used a `ld` linker-flavor, but I think, having the special CLI convention is a more reliable way.

Questions about further progress on reliable CUDA workflow with Rust:
1. Is it possible to create a test suite `codegen-asm` to verify end-to-end integration with LLVM backend?
1. How would it be better to organise no-std `compile-fail` tests: add `#![no_std]` where possible and mark others as `ignore-nvptx` directive, or alternatively, introduce `compile-fail-no-std` test suite?
1. Can we have the `ptx-linker` eventually be integrated as `rls` or `clippy`? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the [current hacky solution](https://github.com/denzp/rustc-llvm-proxy).
1. Am I missing some methods from `rustc_codegen_ssa::back::linker::Linker` that can be useful for bitcode-only linking?

Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.

Related to #38789
Fixes #38787
Fixes #38786
@bors
Copy link
Contributor

bors commented Feb 2, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: nagisa
Pushing 2efa31b to master...

@bors bors merged commit 49931fd into rust-lang:master Feb 2, 2019
@denzp denzp deleted the nvptx branch February 2, 2019 16:14
@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 4, 2019

Thank you @denzp !

kennytm added a commit to kennytm/rust that referenced this pull request Feb 5, 2019
Add NVPTX target to a build manifest

Include `nvptx64-nvidia-cuda` target to a build manifest. I forgot this step at my first take on adding the target (rust-lang#57937).

Hopefully, this is the only reason why `rustup target add nvptx64-nvidia-cuda` doesn't work 🙁

r? @alexcrichton
@termoshtt termoshtt mentioned this pull request Dec 30, 2019
7 tasks
ehuss added a commit to ehuss/rust-forge that referenced this pull request May 15, 2020
The following targets are now built and distributed:

- aarch64-unknown-none: rust-lang/rust#68334
- mips64-unknown-linux-muslabi64: rust-lang/rust#65843
- mips64el-unknown-linux-muslabi64: rust-lang/rust#65843
- nvptx64-nvidia-cuda: rust-lang/rust#57937
- riscv32i-unknown-none-elf: rust-lang/rust#62784
- riscv64gc-unknown-linux-gnu: rust-lang/rust#68037
- thumbv8m.base-none-eabi: rust-lang/rust#59182
- thumbv8m.main-none-eabi : rust-lang/rust#56954
- thumbv8m.main-none-eabihf: rust-lang/rust#59182
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants