Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intrinsics for bigint helper methods #131566

Closed

Conversation

clarfonthey
Copy link
Contributor

@clarfonthey clarfonthey commented Oct 11, 2024

Tracking issue: #85532

This adds the following intrinsics:

  • add_with_carry, the implementation for carrying_add
  • sub_with_carry, the implementation for borrowing_sub
  • mul_double, the implementation for widening_mul
  • mul_double_add, the implementation for carrying_mul
  • mul_double_add2, the implementation for a new method, carrying2_mul, which is like carrying_mul but accepts two arguments to add instead of one

Right now, there is no significant advantage to the add_with_carry and sub_with_carry intrinsics on LLVM over the existing implementation, since the intrinsic version is effectively identical in implementation. However, the GCC backend does have dedicated intrinsics for these, and LLVM and other backends may eventually support them.

The largest advantage comes from the multiplication intrinsics, since they can internally cast up to 256-bit integer multiplication for i128 and u128, which is not currently possible in Rust alone. The result will very likely optimize better than if it were written by hand.

A few additional notes:

  • Since most of these methods are at least ternary, it's not possible to lower these methods to MIR binary operations. The only one where this would be possible is widening_mul, and it's unclear whether it's worth special-casing that one. For now, they're all lowered as function calls.
  • It's unclear what the final names for the stable functions will be, or if the double-carry multiplication operator will be stabilized. However, since intrinsics are always unstable, these can easily be removed in the future, and the amount of extra code they add is relatively insignificant. Having the intrinsics available in the bootstrap compiler eventually will make working with these easier: removing them is easy, but adding them in is difficult, due to the special cases needed.
  • This adds the signed version of all the multiplication operators, which was previously removed. This uses the new recommendation to have an always-unsigned insignificant word, despite the significant word always being signed. Since it makes no difference for addition, the carries for signed multiplication are always signed. However, to simplify the intrinsic, two words of the same type are always returned and the inherent methods choose to cast one of them unsigned as needed.
  • A lot of care was put into tailoring the docs for existing methods to explain their purpose and using them correctly. I have not taken that care for the double-carry and signed method docs, and someone else can do that in a future change.
  • Several tests have been added to libcore to ensure that these methods work correctly.

I'm marking this PR as WIP because I have only done the LLVM implementation of the intrinsics, and it feels appropriate to do the other backends as well, since these feel like pretty fundamental operations that people will want to use. In particular, miri needs to also know how to execute these intrinsics as well, and I haven't implemented that either.

@rustbot
Copy link
Collaborator

rustbot commented Oct 11, 2024

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 11, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@jhpratt
Copy link
Member

jhpratt commented Oct 12, 2024

Not familiar with LLVM to the extent necessary. I think this is T-compiler land given that it's an intrinsic?

r? compiler

@rustbot rustbot assigned cjgillot and unassigned jhpratt Oct 12, 2024
@clarfonthey
Copy link
Contributor Author

Not familiar with LLVM to the extent necessary. I think this is T-compiler land given that it's an intrinsic?

r? compiler

Sorry I forgot to respond to this, but: yes. The goal here is to add the intrinsics without worrying about the libs side since everything is unstable anyway.

@clarfonthey
Copy link
Contributor Author

Gonna be honest, I managed to figure out how the LLVM code works, I managed to figure out how the consteval code works, and I will probably be able to figure out how the cranelift code works. I have absolutely no idea how the GCC code is supposed to work, especially with the various ABI calling conventions in the code that look like they're statically determined, but somehow also not.

I initially wanted to offer fallback implementations for these intrinsics, but that's not really possible since there's a lot of maths that can't really be done in a type-generic way without adding way too many bounds on the intrinsics. I'm just going to leave these unimplemented for now and someone with way more experience on these backends can pick up the slack. Or, if someone here wants to give me a bit more advice on how this would even be implementable, I can maybe try to do it myself.

I also don't know the best way of testing the consteval code because the ability to run tests in const context is basically nonexistent. I could probably be more creative and figure a way around this, like I have for library code, but eh. I'm just going to leave it for now and hand-wave it as an "unstable feature" for now.

It's also just, kinda worth commenting on how much commenting is… not, present on a lot of this code. Like, the Rust code is fine, but since most of these backends are used over FFI, there are a lot of weird conventions that have to be followed and I have absolutely no idea what those are. I filed #131562 to cover part of this but it feels like lack of documentation in the compiler is probably a much bigger issue than I perceived it would be.

Like, I know that a lot of the rustc dev guide stuff says you can message people on Zulip for advice, but uh. "Message someone in chat" is not documentation, obviously. And I hate bothering people.

That's my rambling for now. Will try and make sure I can get at least everything to compile and the tests to pass. I know that the LLVM backend code works for sure, since I got it to pass all the library tests, but everything else is a massive shrug emoji for now.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@clarfonthey clarfonthey marked this pull request as ready for review October 14, 2024 10:43
@rustbot
Copy link
Collaborator

rustbot commented Oct 14, 2024

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the Miri implementation! :)

However, it does not seem to be tested?

(lo, hi)
}
} else {
let prod = l * r + c1 + c2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why this does not cause overflow. Also please use strict_ operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will convert to strict, although I thought that the lack of overflow was evident by the fact that the 128-bit case was covered separately-- all other integers would be maximum 64 bits, where this operation cannot overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are multiplying two 64bit numbers here, which results in a 128bit number. Then you add more stuff. u64::MAX * u64::MAX is fairly close to u128::MAX, and then we add stuff... why can't this overflow? I have no intuition for this, so it definitely needs comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that u64::MAX * u64::MAX + u64::MAX + u64::MAX is actually u128::MAX, which is the principle behind allowing up to two carries for double-wide multiplication.

But yes, I'll add comments.

}
#[cfg(not(bootstrap))]
{
let (lo, hi) = l.carrying2_mul(r, c1, c2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate... for basic arithmetic like this, it'd be better if we had our own implementation of them rather than having to use the host operation. How hard would that be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my thought process here is that the ideal solution is to replace the current hard-coded i128/u128 version of the code with a bigint implementation and do the multiplication directly in all cases. That would be the most ideal, and it would support a potential future with integers larger than 128 bits. It would also likely use methods like this one to perform bigint multiplication.

However, without that, my thought process was that I could either manually code a version of carrying2_mul here that would perform worse and require extra scrutiny, or just use the version that's already been implemented and tested.

I'll defer to whatever you think is the better option, but that at least explains my reasoning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how hard this is to implement directly, is there some code somewhere that would give me an idea?

It's also not great to have a completely different codepath for u128 and the rest, that makes proper testing more tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just sharing a few of the implementations mentioned on the tracking issue:

Note that also, regardless of what we do, the result of double-wide multiplication of 128-bit integers is going to be two 128-bit integers, and it's only going to be the case of 128-bit integers where we need to scoop out the extra data from the more-significant 128 bits. So, effectively, even if I had the same path for all integers using the 128-bit double-wide mul, we'd still be special-casing 128 bits by only looking at the higher-order word in the 128-bit case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a bigint implementation

FWIW I'd love that for all our arithmetic.^^ It's probably too slow though. And using it only sometimes seems odd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, without that, my thought process was that I could either manually code a version of carrying2_mul here that would perform worse and require extra scrutiny, or just use the version that's already been implemented and tested.

So, my thought process here is that we typically want to be independent from possible bugs in the standard library, and provide our own reference implementation. But, we haven't done that in numeric_intrinsic, so it'd be odd to use a higher standard here.

So fair, please stick with the current implementation, just with more comments.

Comment on lines +628 to +632
let overflowed = Scalar::from_bool(if l.layout.abi.is_signed() {
overflowed1 != overflowed2
} else {
overflowed1 | overflowed2
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining the logic here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely should just update the intrinsic docs to clarify the behaviour here, although the standard library has a bit of a weird relationship with how it documents intrinsics.

Most of them rely on having stabilised versions that they can just point to, since those will have the proper docs. The documentation here is split between iN::carrying_add, uN::carrying_add, iN::borrowing_sub, and uN::borrowing_sub, but the bottom line is that signed methods are merely checking for overflow, whereas unsigned methods want to actually return a new carry bit that can be chained along. That's what we're testing for in the methods and I'm just duplicating that here.

Not sure what the best solution for documentation would be here; open to ideas. I could just link those docs here, for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't understand why the signed thing overflowed if exactly one of the sub-operations overflowed. Like I could probably think about it for a minute and figure it out, but there should really be comments explaining that.

Your answer confused me even more, in which sense is the result different for signed vs unsigned? That should also be documented...

@@ -573,6 +601,106 @@ impl<'tcx, M: Machine<'tcx>> InterpCx<'tcx, M> {
})
}

pub fn carrying_arith(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc comment to both new methods

@@ -626,6 +626,17 @@ where
self.write_immediate(Immediate::Scalar(val.into()), dest)
}

/// Write a scalar pair to a place
#[inline(always)]
pub fn write_scalar_pair(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this helper. We have immediates specifically to represent these pairs. Also this encourages to have scalar pairs without having their type, which is dangerous. I think we actually want to change write_immediate to take an ImmTy instead of an Immediate, but that's a larger change... but this helper moves us in the wrong direction IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having typed immediates everywhere, I just know that this isn't what the code is doing right now, and that's why I added this method, since it's either this or import Immediate directly into the intrinsics module and construct one myself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or import Immediate directly into the intrinsics module and construct one myself.

Yes please do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed that the presence of write_scalar was to avoid that, but I'll keep that in mind. Perhaps write_scalar should also be removed if the goal is to have typed immediates everywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalars are much less at risk of odd effects due to bad types since there's no field offsets / padding being computed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. I wasn't aware that this padding/offset was even meaningful since I thought that the reason for having a "pair" primitive was explicitly to avoid this.

@RalfJung
Copy link
Member

This uses the new recommendation to have an always-unsigned insignificant word, despite the significant word always being signed. Since it makes no difference for addition, the carries for signed multiplication are always signed. However, to simplify the intrinsic, two words of the same type are always returned and the inherent methods choose to cast one of them unsigned as needed.

I have no idea what this refers to, nor what it means. But the docs don't seem to talk about it at all, so that seems bad? If this pertains to the behavior of the intrinsics, then please add it to the intrinsic docs.

I also don't know the best way of testing the consteval code because the ability to run tests in const context is basically nonexistent. I could probably be more creative and figure a way around this, like I have for library code, but eh. I'm just going to leave it for now and hand-wave it as an "unstable feature" for now.

We have two ways of testing them. One is to add some functions in core/tests that call the intrinsics from a const block. That needs cfg(not(bootstrap)), and then you can use ./x.py test library/core --stage 1 to test it. The other is to do the same in a ui test. I would say in this case putting them in core/tests is better as that's where the other tests also are. (We should probably migrate some of the tests from ui tests to regular lib tests... oh well.)

@clarfonthey
Copy link
Contributor Author

So, my main issue with testing these is that I was running into lots of headaches with the common testing faculties being unusable in const context. I will admit that when I was trying to get it working, I got frustrated and gave up instead of remembering some of the alternatives I've used in the past, but it was still frustrating.

The biggest issue is that tuple equality does not work in const context, and so I have to resort to other methods. What seems most reasonable is to use assert!(matches!(...)) since I've used that in the past to some success.

@RalfJung
Copy link
Member

Yeah, either matches! or some sort of custom macro should work.

@clarfonthey
Copy link
Contributor Author

So, I'm going to close this, not because I want to stop working on it, but because I feel like it's going to be easier to work on this in smaller chunks, like just carrying_add and borrowing_sub, first, alongside other stuff like #131707.

So, it's not over, but, gonna be split up a bit.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Oct 23, 2024
…=thomcc

Run most `core::num` tests in const context too

This adds some infrastructure for something I was going to use in rust-lang#131566, but it felt worthwhile enough on its own to merge/discuss separately.

Essentially, right now we tend to rely on UI tests to ensure that things work in const context, rather than just using library tests. This uses a few simple macro tricks to make it *relatively* painless to execute tests in both runtime and compile-time context. And this only applies to the numeric tests, and not anything else.

Recommended to review without whitespace in the diff.

cc `@RalfJung`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Oct 23, 2024
Rollup merge of rust-lang#131707 - clarfonthey:constify-core-tests, r=thomcc

Run most `core::num` tests in const context too

This adds some infrastructure for something I was going to use in rust-lang#131566, but it felt worthwhile enough on its own to merge/discuss separately.

Essentially, right now we tend to rely on UI tests to ensure that things work in const context, rather than just using library tests. This uses a few simple macro tricks to make it *relatively* painless to execute tests in both runtime and compile-time context. And this only applies to the numeric tests, and not anything else.

Recommended to review without whitespace in the diff.

cc `@RalfJung`
@RalfJung
Copy link
Member

Looks like #133663 is an alternative proposal for at least some of the helper methods here.

@scottmcm
Copy link
Member

Oh, thanks for the backlink! I didn't know this PR existed.

Yeah, I approached this differently in two major ways:

  1. Saying that it needs to have fallback MIR, because by golly I never want to implement an intrinsic for all four backends again, given the option.
  2. Just making one intrinsic for a*b+c+d, trusting that LLVM will optimize out +0 just fine. (As will cranelift, if it has optimizations enabled.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants