-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVPTX target specification #57937
NVPTX target specification #57937
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cramertj (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
r? @nagisa cc @rkruppe cc @alexcrichton |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
cpu: "sm_20".to_string(), | ||
|
||
// TODO(denzp): create tests for the atomics. | ||
max_atomic_width: Some(64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I did a few basic tests, and it turns out the the atomic_xadd
rust-intrinsic seems to already generate correct PTX when doing atomic adds on u32
and u64
. Hurray! I didn't test too many other atomic ops or sizes though.
(Sadly, atomic_xadd
only supports integers and doesn't compile for f32
and f64
types.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now comments are as is. I find tests somewhat sketchy, but I understand that the hand may be forced due to inadequate flexibility of other test suites.
} | ||
|
||
fn optimize(&mut self) { | ||
self.cmd.arg(match self.sess.opts.optimize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels suspect. I’m not sure the optimisation level for the leaf crate should influence the optimisation level for the whole crate graph. It feels like this should at least in some way depend on the LTO flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed args pattern from other linkers here.
At the moment the linker runs both LTO and optimisation passes over the final (complete) module when -O{1,2,3}
is specified.
Does Rust runs optimisations before emitting bitcode object files? If yes, it can be avoided on linker's side then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Rust runs optimisations before emitting bitcode object files?
Yes it does.
I followed args pattern from other linkers here.
For gcc
and/or clang
optimization flags during linkage mean very little if anything at all AFAIK. It definitely does not invoke some sort of global program re-optimisation, which is what happens with ptx-linker, it seems.
Such global-program optimisation is what lto
would usually do, which is why I suggested that perhaps that flag is what should be accounted for :)
That could be left as a future endeavour as well, but invoking “LTO” just from -Copt-level
feels wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation and suggestion, I overlooked Session::lto(&self)
before!
I've also changed ptx-linker
to not perform final global optimisation - indeed it didn't affect much.
@@ -149,6 +149,7 @@ pub fn linker_and_flavor(sess: &Session) -> (PathBuf, LinkerFlavor) { | |||
LinkerFlavor::Ld => "ld", | |||
LinkerFlavor::Msvc => "link.exe", | |||
LinkerFlavor::Lld(_) => "lld", | |||
LinkerFlavor::PtxLinker => "rust-ptx-linker", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… so we are depending on external project by default here. Something that users are very unlikely to have installed by default.
I wonder what the error looks like when rust-ptx-linker
is not in $PATH
. Perhaps it would make sense to ship rust-ptx-linker
as part of rustlib like we do with e.g. lld
component for some targets...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. On the other hand, to get started with CUDA development in Rust, users will follow either docs or tutorials, where it should be mentioned how to setup the environment.
The error message is, currently:
error: linker `rust-ptx-linker` not found
|
= note: No such file or directory (os error 2)
error: aborting due to previous error
Personally, I would love to have the linker to be shipped via rustup. But perhaps, it will be preferable to implement this as a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But perhaps, it will be preferable to implement this as a separate PR?
Sure.
@nagisa @peterhj thanks for reviewing! I can agree about tests - normally these 3 How can I use That's why I had an idea about introducing |
Currently, AFAIK there is no way. Minor changes to compiletest are necessary to make it possible (i.e. you can tell codegen test to generate assembly, but no way to tell compiletest to look at anything but |
Thanks for this! The only comment I'd add is that we unfortunately don't have resources for another image to run on Travis right now, but could this be folded into an existing builder that's already performing well under 2h on average? |
@alexcrichton I can see that Can I add NVPTX tests there? The only thing, it feels like the image should be renamed then. Do you have suggestions about the naming? |
@denzp sure yeah so long as it still runs in under 2 hrs should be fine to add! It should be fine to rename it to something like |
|
||
ifeq ($(TARGET),nvptx64-nvidia-cuda) | ||
all: | ||
$(RUSTC) main.rs -Clink-arg=--arch=sm_60 --crate-type="bin" -O --target $(TARGET) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does RUSTFLAGS=-C target-cpu=sm_60
achieve the same effect here as passing the link flags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a nice catch.
I've addressed this providing target_cpu
via --fallback-arch
arg to the linker in the latest commit.
@alexcrichton thanks, I've merged the images as suggested! |
👍 |
The compiler bits look good to me. I’ll assume that Alex’s 👍 is an approval for the infra changes. @bors r+ |
📌 Commit 49931fd has been approved by |
NVPTX target specification This change adds a built-in `nvptx64-nvidia-cuda` GPGPU no-std target specification and a basic PTX assembly smoke tests. The approach is taken here and the target spec is based on `ptx-linker`, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side. Prior to this change, the linker used a `ld` linker-flavor, but I think, having the special CLI convention is a more reliable way. Questions about further progress on reliable CUDA workflow with Rust: 1. Is it possible to create a test suite `codegen-asm` to verify end-to-end integration with LLVM backend? 1. How would it be better to organise no-std `compile-fail` tests: add `#![no_std]` where possible and mark others as `ignore-nvptx` directive, or alternatively, introduce `compile-fail-no-std` test suite? 1. Can we have the `ptx-linker` eventually be integrated as `rls` or `clippy`? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the [current hacky solution](https://github.com/denzp/rustc-llvm-proxy). 1. Am I missing some methods from `rustc_codegen_ssa::back::linker::Linker` that can be useful for bitcode-only linking? Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates. Related to #38789 Fixes #38787 Fixes #38786
☀️ Test successful - checks-travis, status-appveyor |
Thank you @denzp ! |
Add NVPTX target to a build manifest Include `nvptx64-nvidia-cuda` target to a build manifest. I forgot this step at my first take on adding the target (rust-lang#57937). Hopefully, this is the only reason why `rustup target add nvptx64-nvidia-cuda` doesn't work 🙁 r? @alexcrichton
The following targets are now built and distributed: - aarch64-unknown-none: rust-lang/rust#68334 - mips64-unknown-linux-muslabi64: rust-lang/rust#65843 - mips64el-unknown-linux-muslabi64: rust-lang/rust#65843 - nvptx64-nvidia-cuda: rust-lang/rust#57937 - riscv32i-unknown-none-elf: rust-lang/rust#62784 - riscv64gc-unknown-linux-gnu: rust-lang/rust#68037 - thumbv8m.base-none-eabi: rust-lang/rust#59182 - thumbv8m.main-none-eabi : rust-lang/rust#56954 - thumbv8m.main-none-eabihf: rust-lang/rust#59182
This change adds a built-in
nvptx64-nvidia-cuda
GPGPU no-std target specification and a basic PTX assembly smoke tests.The approach is taken here and the target spec is based on
ptx-linker
, a project started about 1.5 years ago. Key feature: bitcode object files being linked with LTO into the final module on the linker's side.Prior to this change, the linker used a
ld
linker-flavor, but I think, having the special CLI convention is a more reliable way.Questions about further progress on reliable CUDA workflow with Rust:
codegen-asm
to verify end-to-end integration with LLVM backend?compile-fail
tests: add#![no_std]
where possible and mark others asignore-nvptx
directive, or alternatively, introducecompile-fail-no-std
test suite?ptx-linker
eventually be integrated asrls
orclippy
? Hopefully, this should allow to statically link against LLVM used in Rust and get rid of the current hacky solution.rustc_codegen_ssa::back::linker::Linker
that can be useful for bitcode-only linking?Currently, there are no major public CUDA projects written in Rust I'm aware of, but I'm expecting to have a built-in target will create a solid foundation for further experiments and awesome crates.
Related to #38789
Fixes #38787
Fixes #38786