Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new compare_bytes intrinsic instead of calling memcmp directly #114382

Merged
merged 2 commits into from
Aug 7, 2023

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Aug 2, 2023

As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target. (I didn't actually add those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it const (unstable), which I don't think extern "C" fn memcmp can be.

cc @RalfJung @Amanieu

@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

r? @cjgillot

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 2, 2023
@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from 51c1830 to e5b23a6 Compare August 2, 2023 20:38
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from e5b23a6 to 029d660 Compare August 2, 2023 21:10
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from 029d660 to 4bccdab Compare August 2, 2023 21:28
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from 4bccdab to f732b5a Compare August 2, 2023 21:57
@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

The Miri subtree was changed

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from f732b5a to d29af67 Compare August 2, 2023 22:28
///
/// # Safety
///
/// * If `bytes` is zero, `left` and `right` must be non-null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// * If `bytes` is zero, `left` and `right` must be non-null.
/// * Even if `bytes` is zero, `left` and `right` must be non-null.

otherwise this reads surprisingly if you don't already know about this

let left = self.read_pointer(left)?;
let right = self.read_pointer(right)?;
let n = Size::from_bytes(self.read_target_usize(byte_count)?);
let result = if n == Size::ZERO {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know that it would already do the right null checks without also requiring "real" alloc ids. I'll remove the extra checks.

(And yes, that's where I got read_bytes_ptr_strip_provenance 🙂)

@rustbot author

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. :)

The checks Miri's functions do by default should exactly match what the docs call "valid for reads of size N", that's why you don't need any extra checks. This also ensures we apply the exact same checks everywhere (direct memory accesses, memcpy, memset, memcmp, ...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like the synergy between that and your notes about how to document it.

(More updates coming for the other comments.)

Comment on lines 2394 to 2404
/// # Safety
///
/// * If `bytes` is zero, `left` and `right` must be non-null.
///
/// * If `bytes` is non-zero,
///
/// * `left` must be [valid] for reads of `bytes` bytes,
/// all of which must be initialized.
///
/// * `right` must be [valid] for reads of `bytes` bytes,
/// all of which must be initialized.
Copy link
Member

@RalfJung RalfJung Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should follow the usual documentation of all our intrinsics: left and right must be valid for reads of size bytes. That's all we need to say.

We already say that as far as Rust is concerned, 4 as *const u8 is valid for reads of size 0. We shouldn't special-case "if bytes is zero" here in the docs nor in the Miri implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, if we ever declare that even the null pointer is valid for zero-sized reads and writes (which we may, rust-lang/opsem-team#12), then even passing null to these intrinsics should be allowed.

--> $DIR/const-compare-bytes-ub.rs:34:9
|
LL | compare_bytes([1].as_ptr(), MaybeUninit::uninit().as_ptr(), 1)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ reading memory at alloc25[0x0..0x1], but memory is uninitialized at [0x0..0x1], and this operation requires initialized memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that involves provenance? As in, pass in a buffer that contains pointers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like compare_bytes([&0].as_ptr(), [&0].as_ptr(), size_of_val(&0)) should probably do it

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 3, 2023
@@ -18,7 +18,7 @@ Respanned: TokenStream [Ident { ident: "$crate", span: $DIR/auxiliary/make-macro
use core /* 0#1 */::prelude /* 0#1 */::rust_2018 /* 0#1 */::*;
#[macro_use /* 0#1 */]
extern crate core /* 0#1 */;
extern crate compiler_builtins /* 443 */ as _ /* 0#1 */;
extern crate compiler_builtins /* 444 */ as _ /* 0#1 */;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how this is impacted, but 🤷

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from d29af67 to 1f4b525 Compare August 4, 2023 06:01
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Aug 6, 2023
@klensy
Copy link
Contributor

klensy commented Aug 6, 2023

Comments here too?

// Use memcmp for bytewise equality when the types allow

// memcmp compares a sequence of unsigned bytes lexicographically.

//! * `memcpy`, `memcmp`, `memset`, `strlen` - These are core memory routines which are

// This isn't an "LLVM intrinsic", but LLVM's optimization passes
// recognize it like one and we assume it exists in `core::slice::cmp`
match self.sess().target.arch.as_ref() {
"avr" | "msp430" => ifn!("memcmp", fn(ptr, ptr, t_isize) -> t_i16),
_ => ifn!("memcmp", fn(ptr, ptr, t_isize) -> t_i32),
}

@cjgillot
Copy link
Contributor

cjgillot commented Aug 6, 2023

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 6, 2023
@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2023

This one

//! * `memcpy`, `memcmp`, `memset`, `strlen` - These are core memory routines which are

has an ongoing PR clarifying our documentation: #114412.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from 1ac2710 to febffe2 Compare August 6, 2023 22:04
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the compare-bytes-intrinsic branch from febffe2 to 75277a6 Compare August 6, 2023 22:47
@scottmcm
Copy link
Member Author

scottmcm commented Aug 6, 2023

Updated those comments in https://github.com/rust-lang/rust/compare/1ac2710780489b50a5feb5e838f8c81a66720dbc..febffe212880dfef083bb860de45abc5aa075a3a, and rebased to pick up another proc-macro test change.

@bors r=cjgillot

@bors
Copy link
Contributor

bors commented Aug 6, 2023

📌 Commit 75277a6 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 6, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 7, 2023
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#98935 (Implement `Option::take_if`)
 - rust-lang#114093 (Add regression test for `echo 'mod unknown;' | rustc -`)
 - rust-lang#114229 (Nest tests/codegen/sanitizer*.rs tests in sanitizer dir)
 - rust-lang#114230 (Nest other codegen test topics)
 - rust-lang#114362 (string.rs: remove "Basic usage" text)
 - rust-lang#114365 (str.rs: remove "Basic usage" text)
 - rust-lang#114382 (Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly)
 - rust-lang#114549 (Style fix and refactor on resolve diagnostics)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit cbe2522 into rust-lang:master Aug 7, 2023
@rustbot rustbot added this to the 1.73.0 milestone Aug 7, 2023
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Aug 9, 2023
…r=cjgillot

Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly

As discussed in rust-lang#113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target.  (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.

cc `@RalfJung` `@Amanieu`
@scottmcm scottmcm deleted the compare-bytes-intrinsic branch August 15, 2023 20:01
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Sep 5, 2023
document our assumptions about symbols provided by the libc

LLVM makes assumptions about `memcmp`, `memmove`, and `memset` that go beyond what the C standard guarantees -- see https://reviews.llvm.org/D86993. Since we use LLVM, we are inheriting these assumptions.

With rust-lang#114382 we are also making a similar assumption about `memcmp`, so I added that to the list.

Fixes rust-lang/unsafe-code-guidelines#426.
RalfJung pushed a commit to RalfJung/miri that referenced this pull request Sep 6, 2023
document our assumptions about symbols provided by the libc

LLVM makes assumptions about `memcmp`, `memmove`, and `memset` that go beyond what the C standard guarantees -- see https://reviews.llvm.org/D86993. Since we use LLVM, we are inheriting these assumptions.

With rust-lang/rust#114382 we are also making a similar assumption about `memcmp`, so I added that to the list.

Fixes rust-lang/unsafe-code-guidelines#426.
antoyo pushed a commit to antoyo/rust that referenced this pull request Oct 9, 2023
…r=cjgillot

Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly

As discussed in rust-lang#113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target.  (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.

cc `@RalfJung` `@Amanieu`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants