-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new intrinsic is_constant
and optimize pow
#114390
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri |
Make it insta-const and just always return |
This comment has been minimized.
This comment has been minimized.
That's not really it. This let's llvm know that we have a code path it will be able to optimize better if the arg is const, than the code path taken if the arg is not const. This is a hint, similar to how we have |
Yeah that's true, I was referring to the behavior though. In practice it'll do basically that, folding the next statements to a constant then removing the branch altogether |
723ab6d
to
e317dbd
Compare
The Miri subtree was changed cc @rust-lang/miri |
e317dbd
to
acbf90a
Compare
Some changes occurred in src/tools/cargo cc @ehuss |
?? How'd that happen? |
This comment has been minimized.
This comment has been minimized.
|
You still have a cargo submodule update for cargo in your branch, please remove that. Looks like it is in an isolated commit so a Not sure how that ended up in your branch, my guess would be a rebase got wrong. Please never rebase unless necessary. |
let mut saw_false = false; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut saw_false = false; | |
let mut saw_false = false; |
just a nit to make it visually more clear that these are all together testing one thing
EDIT: hm I tried to remove the empty line, not sure if github supports that...
I just ran it with |
You can use It's ok for opts to differ between llvm versions, so if older llvm ends up returning |
☔ The latest upstream changes (presumably #114776) made this pull request unmergeable. Please resolve the merge conflicts. |
@Centri3 any updates on this? |
No, I can add that comment and rebase but I won't have the time to debug it if it doesn't work |
☔ The latest upstream changes (presumably #117525) made this pull request unmergeable. Please resolve the merge conflicts. |
This could be useful for integer division too. While it is generally best to leave these kinds of optimizations to the compiler, the current version of LLVM produces sub-optimal division code in some cases. When the integer width is two machine words (u128 on 64-bit machines), LLVM only does the divide by constant optimization if the odd factor of the divisor can exactly divide 2bits - 1. Otherwise, it calls a function. When the integer width is four machine words (u128 on 32-bit machines), LLVM divides using subtraction, even when the divisor is a constant. |
Are there LLVM issues tracking these missed optimizations? |
There are a few related issues. While these are all open as of writing, some seem to be partly out of date:
In Rust: Many use 10 or 5 as an example. Since rustc 1.70 (corresponding to llvm-16), dividing by 10 and 5 is optimized when the dividend is 2 machine words. The issues seem to all be about dividing two machine words, while there are no issues specific to more than two words. @RalfJung, If you are thinking about posting an issue yourself, consider emphasizing 4 word division. Here is a Compiler Explorer link to Rust code showing division by small constants. It is currently set to i686. To change the target back to the default, go to Overrides > Target architecture, and select the blank option at the top of the drop down menu. Here is a link to edit the llvm IR. Note the differences between versions 15 and 16. I am not certain, but I have a hunch that the difference between divisors that are optimized for 2 words and divisors that aren't has to do with the divisibility of 2n - 1. I think so because of this paragraph on page 1 of T. Granlund, P. Montmery (1994):
For reference:
|
No I just wanted to be sure we don't carry work-arounds for LLVM issues that LLVM isn't aware of. Also now I can ping @nikic to get an estimate of whether it makes more sense to prioritize improving LLVM optimizations vs providing an is_compile_time_known intrinsic. :) Though to be clear, nothing is blocking such an is_compile_time_known, it just needs someone to finish the work that was started in this PR. |
@RalfJung The above discussion about divisions is irrelevant to this PR -- you obviously shouldn't use is_compile_time_known() to optimize divisions in the frontend. As for the 2.pow issue being actually solved here, using is_compile_time_known() is a very reasonable approach to that problem. I mean, we could add a loop idiom recognition pattern to LLVM to handle this, but it seems like a pretty silly thing to do, and will be fragile if what rustc emits and what LLVM matches for diverges. |
/// ```rust | ||
/// if !is_val_statically_known(0) { unreachable_unchecked(); } | ||
/// ``` | ||
/// | ||
/// This also means that the following code's behavior is unspecified; it | ||
/// may panic, or it may not: | ||
/// | ||
/// ```rust,no_run | ||
/// assert_eq!(is_val_statically_known(0), black_box(is_val_statically_known(0))) | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both doc tests fail when ran with ./x test core --stage 1 --doc -- is_val_statically_known
. The doc tests are never ran with --stage 0
because of #[cfg(not(bootstrap())]
.
You need to add unsafe
, use
, and feature
statements to make them compile. If you think they should be omitted from the docs, then you can prepend them with #
.
Here is an example that passes the tests:
/// ```rust
/// #![feature(is_val_statically_known)]
/// #![feature(core_intrinsics)]
/// use std::hint::unreachable_unchecked;
/// use std::intrinsics::is_val_statically_known;
///
/// unsafe {
/// if !is_val_statically_known(0) { unreachable_unchecked(); }
/// }
/// ```
///
/// This also means that the following code's behavior is unspecified; it
/// may panic, or it may not:
///
/// ```rust,no_run
/// #![feature(is_val_statically_known)]
/// #![feature(core_intrinsics)]
/// use std::hint::black_box;
/// use std::intrinsics::is_val_statically_known;
///
/// unsafe {
/// assert_eq!(is_val_statically_known(0), black_box(is_val_statically_known(0)));
/// }
/// ```
The GCC backend could implement this with I don't understand the work flow for the other backends so I don't know if the intrinsic should be added in this PR or if it needs to be done in the backends separate repos. |
}; | ||
// So it panics. Have to use `overflowing_mul` to efficiently set the | ||
// result to 0 if not. | ||
#[cfg(debug_assertions)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is incorrect to use #[cfg(debug_assertions)]
here. In general, we can't assume debug_assertions
matches overflow_checks
even though they go together by default. This is especially true because this function has the #[rustc_inherit_overflow_checks]
.
Without #[cfg(debug_assertions)]
, it is still incomplete because the greatest possible u32 is much greater than the number of bits in an integer.
To correctly check for overflow, test to see if the final result is less than 1. If that is the case use _ = <Self>::MAX * <Self>::MAX;
to force a panic iff overflow_checks
are enabled.
Although it isn't necessary, you can make the code less verbose by using saturating_mul
instead of overflowing_mul
and by using (1 as Self).checked_shl().unwrap_or_default()
instead of (1 << num_shl) * fine as Self
.
@Centri3 I'd be happy to take over if you won't be able to finish it. |
Add Benchmarks for int_pow Methods. There is quite a bit of room for improvement in performance of the `int_pow` family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of rust-lang#114390. ~~I added a lot (245), but all but 22 of them are marked with `#[ignore]`. There are a lot of macros, and I would appreciate feedback on how to simplify them.~~ ~~To run benches relevant to rust-lang#114390, use `./x bench core --stage 1 -- pow_base_const --include-ignored`.~~
Add Benchmarks for int_pow Methods. There is quite a bit of room for improvement in performance of the `int_pow` family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of rust-lang#114390. ~~I added a lot (245), but all but 22 of them are marked with `#[ignore]`. There are a lot of macros, and I would appreciate feedback on how to simplify them.~~ ~~To run benches relevant to rust-lang#114390, use `./x bench core --stage 1 -- pow_base_const --include-ignored`.~~
…li-obk Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion. While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests. Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage). Fixes rust-lang#47234 Resolves rust-lang#114390 `@Centri3`
…li-obk Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion. While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests. Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage). Fixes rust-lang#47234 Resolves rust-lang#114390 `@Centri3`
…li-obk Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion. While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests. Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage). Fixes rust-lang#47234 Resolves rust-lang#114390 `@Centri3`
…li-obk Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion. While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests. Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage). Fixes rust-lang#47234 Resolves rust-lang#114390 `@Centri3`
This was suggested in #114254. This basically lets LLVM know that it can evaluate the next few statements at compile-time, and thus can completely remove them. This allows us to select a path at compile-time, and with optimizations on, the other will be entirely removed. This will likely be a bit slower in debug mode but this should be ok.
I've done this for the integer's
pow
methods as was done in #114254 and it works great!We should probably restrict the types that can be passed to it, though,
llvm.is.constant
can be passed any type but some ICE. (So far, I know that all ZSTs cause an ICE)cc @rust-lang/lang; I highly doubt this shouldn't be insta-
const
, but I think discussion should be warranted anyway.