Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement raw API #279

Closed
wants to merge 4 commits into from
Closed

Implement raw API #279

wants to merge 4 commits into from

Conversation

newpavlov
Copy link
Member

@newpavlov newpavlov commented Aug 18, 2022

A simpler alternative to #271.

It introduces getrandom_raw function with the following signature:

pub unsafe fn getrandom_raw(dst: *mut u8, len: usize) -> Result<(), Error> { ... }

Note that getrandom is built on top of this function and all getrandom_impls also now implement the pointer-based API. It fits quite well for all backends except the js one.

It allows to fill uninitialized arrays with random data.

Additionally, as a safe wrapper around getrandom_raw this PR adds getrandom_uninit:

pub fn getrandom_uninit(dst: &mut [MaybeUninit<u8>]) -> Result<&mut [u8], Error> { ... }

Because of MaybeUninit MSRV gets bumped to 1.36 (the same MSRV as rand).

I have reworked the benchmark and I get the following results:

running 6 tests
test bench_large     ... bench:  26,509,202 ns/iter (+/- 131,913) = 79 MB/s
test bench_large_raw ... bench:  26,478,315 ns/iter (+/- 96,640) = 79 MB/s
test bench_page      ... bench:      52,642 ns/iter (+/- 376) = 77 MB/s
test bench_page_raw  ... bench:      52,592 ns/iter (+/- 328) = 77 MB/s
test bench_seed      ... bench:         888 ns/iter (+/- 4) = 36 MB/s
test bench_seed_raw  ... bench:         887 ns/iter (+/- 5) = 36 MB/s

On my (Linux-based) PC the difference between getrandom and getrandom_raw is less than 1% and barely noticeable. The former zeroizes stack-based buffer before calling getrandom, while the latter uses MaybeUninit to skip the unnecessary initialization step.

src/js.rs Show resolved Hide resolved
@newpavlov newpavlov marked this pull request as ready for review August 18, 2022 10:02
benches/mod.rs Outdated Show resolved Hide resolved
@newpavlov
Copy link
Member Author

@josephlr
Could you approve it? Unless, of course, you plan to review it more thoroughly later or have some concerns about the approach.

Copy link
Member

@josephlr josephlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach here seems fine, but there would be two things I would like to see:

First, is a better way to deal us not having chunks anymore. A bunch of our code now has repeated sections where we do the same loop manipulations, I'm wondering if we could but something in util.rs to make this nicer. Maybe something similar to sys_fill_exact.

Second, I think we should add #![deny(unsafe_op_in_unsafe_fn)] so that we can better see which operations are unsafe inside our (sometimes large) unsafe functions.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
@newpavlov
Copy link
Member Author

newpavlov commented Sep 28, 2022

A bunch of our code now has repeated sections where we do the same loop manipulations, I'm wondering if we could but something in util.rs to make this nicer.

Yeah, it could be worthwhile. I will look into it a bit later.

I think we should add #![deny(unsafe_op_in_unsafe_fn)] so that we can better see which operations are unsafe inside our (sometimes large) unsafe functions.

Personally, I am not a huge fan of unsafe_op_in_unsafe_fn. Most unsafe functions (in getrandom and my other crates) are quite straightforward and additional unsafe blocks would only add noise. Also adding it would require #![allow(unknown_lints)], if we don't want warnings on older toolchains. We can revisit it after the lint will be enabled by default.

@briansmith
Copy link
Contributor

pub unsafe fn getrandom_raw(dst: *mut u8, len: usize) -> Result<(), Error> { ... }

It allows to fill uninitialized arrays with random data.

If the goal is to fill uninitialized arrays then the API can and should be should be using &mut [MaybeUninit<u8>] for the output parameter, shouldn't it? Then this function wouldn't need to be declared unsafe either.

@newpavlov
Copy link
Member Author

newpavlov commented Oct 5, 2022

You would still need to use unsafe assume_init to read generated data. Plus sometimes you may want to fill something like &mut [u64] and it's easier and more straightforward to cast it into *mut u8 than to &mut [MaybeUninit<u8>].

We could introduce:

pub fn getrandom_uninit(buf: &mut [MaybeUninit<u8>]) -> Result<&mut [u8], Error> { .. }

as a wrapper around getrandom_raw. But there is an issue with MaybeUninit being introduced in Rust 1.36, while our MSRV is 1.34. I guess, we could work around it by introducing a disabled by default feature.

@briansmith
Copy link
Contributor

pub fn getrandom_uninit(buf: &mut [MaybeUninit<u8>]) -> Result<&mut [u8], Error> { .. }

Yes, this is what I meant. Sorry I forgot to mention the return value. If that existed, getrandom_raw wouldn't need to exist, and there would be less unsafe API exposed from the crate.

But there is an issue with MaybeUninit being introduced in Rust 1.36, while our MSRV is 1.34. I guess, we could work around it by introducing a disabled by default feature.

Maybe just increase the MSRV. 1.36 was 2 years ago.

@newpavlov
Copy link
Member Author

I don't think there is anything wrong with exposing unsafe API, if there are legitimate uses for it.

Maybe just increase the MSRV. 1.36 was 2 years ago.

Yeah, I guess we could do it (though I personally use a stricter MSRV policy) considering that rand has MSRV equal to 1.36.

@briansmith
Copy link
Contributor

I don't think there is anything wrong with exposing unsafe API, if there are legitimate uses for it.

I think it would be better to remove it from the API, as in general we should prefer to expose safe APIs instead of unsafe ones. I don't think the _raw version is useful if we have the _uninit variant; if a caller only has a pointer and a length, the caller can construct the MaybeUninit slice from that, and then use the _uninit function.

Copy link
Contributor

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review every line because my feedback here is really just trying to make a single point: Let's not replace so much safe code in this crate with a large quantity of unsafe code. Instead, let's find a way to keep most of the logic safe. Of course we have to call into FFI functions and deal with uninitialized memory, but even still we can and should use unsafe (and dangerous integer casts) much less often.

src/custom.rs Outdated Show resolved Hide resolved
sys_fill: impl Fn(&mut [u8]) -> libc::ssize_t,
pub unsafe fn sys_fill_exact(
mut dst: *mut u8,
mut len: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think this would be better as &mut [MaybeUninit<u8>] instead. Then we could revert it back to be safe (remove unsafe from its declaration).


pub unsafe fn raw_chunks(
mut dst: *mut u8,
mut len: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this could be made a safe function in a similar way, and all the array length arithmetic can be replaced with use of split_at. A name change would be warranted.

pub unsafe fn getrandom_inner(dst: *mut u8, len: usize) -> Result<(), Error> {
// NOTE: WASI is a 32-bit target, it can not have objects bigger
// than `i32::MAX - 1`, so we do not need chunking here
match random_get(dst as i32, len as i32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be true that we can't have a slice larger than i32::MAX - 1 bytes in this situation; I'm not convinced that is really true though. Regardless, this function takes a pointer and a length, not a slice, so there's no reason to think that len: usize is constrained in the way a slice length would be. So I think the chunking should be employed and the dangerous casts should be removed. (Further, clippy lints should be employed to prohibit this use of narrowing as.)

@briansmith
Copy link
Contributor

@newpavlov If you'd like, I could share a PR that implements the _uninit version using the suggestions above that attempt to minimize the amount of unsafe used. LMK.

if a caller only has a pointer and a length, the caller can construct the MaybeUninit slice from that, and then use the _uninit function.

Maybe this is only true of the length is small enough to be a valid slice length. But, again, the current implementation is already making such assumptions.

@notgull
Copy link

notgull commented Oct 6, 2022

In my opinion, we should go for the best of both worlds here. Have a raw pointer API and a MaybeUninit<u8> API. The raw pointer API would serve the MSRV use-case, among other cases that need raw pointers. The MaybeUninit API could be enabled on a feature to preserve the MSRV and serve the other use case. The latter can be implemented as a thin wrapper around the former.

@josephlr
Copy link
Member

josephlr commented Oct 6, 2022

@briansmith @notgull I've opened #286 to figure out the consensus for increasing the MSRV of getrandom 0.2. I think raising to 1.36 wouldn't be much of a problem, but I want to be sure.

@briansmith
Copy link
Contributor

In my opinion, we should go for the best of both worlds here. Have a raw pointer API and a MaybeUninit<u8> API. The raw pointer API would serve the MSRV use-case, among other cases that need raw pointers. The MaybeUninit API could be enabled on a feature to preserve the MSRV and serve the other use case. The latter can be implemented as a thin wrapper around the former.

My position is based on the following guideline: It is important to minimize the amount of unsafe in the implementation of the crate, to better leverage the language to ensure the crate's implementation is memory-safe. It is good to minimize the amount of unsafe APIs exposed to encourage users to write memory-safe code. These ideas are a large part of what makes this whole endeavor of writing Rust libraries meaningful, IMO, and the principle of maximizing memory safety is much more important than maximizing backward compatibility with years-old versions of the Rust toolchain, IMO.

@josephlr
Copy link
Member

josephlr commented Oct 7, 2022

I agree with @briansmith here, I think if we make more use of MaybeUninit inside of the crate, we can make the implementation easier to follow and reduce the amount of unsafe code.

let f: fn(&mut [u8]) -> Result<(), $crate::Error> = $path;
let slice = unsafe { ::core::slice::from_raw_parts_mut(dest, len) };
let slice = unsafe { ::core::slice::from_raw_parts_mut(dst, len) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect and fixing it would break backward compatibility.

getrandom_raw can be called with a pointer to uninitialized memory. So, we cannot legally construct a &mut [u8] slice from that pointer since we're only allowed to construct such a slice if the pointer points to initialized memory.

The function $path has a parameter of type &mut [u8] but to support the _raw API and/or the _uninit API it must take &mut [MaybeUninit<u8>] or a raw pointer and length instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution would be to have register_custom_getrandom! pre-initialize the output buffer before constructing slice, similar to what the JS code does. We could add a register_custom_getrandom_uninit! that does the same thing without the redundant zero pre-initialization that uses MaybeUninit<u8> instead.

@briansmith
Copy link
Contributor

@newpavlov If you'd like, I could share a PR that implements the _uninit version using the suggestions above that attempt to minimize the amount of unsafe used. LMK.

I put up a draft PR #291 that does that. Again, my main goal there is to minimize the amount of unsafe added and to make it clear that the new uses of unsafe are safe. I hope that if a _raw API is really needed then it could be implemented on top of the _uninit variant, instead of vice-versa.

@newpavlov
Copy link
Member Author

Closing it in favor of #291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants