Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHANGE: Make <SmallRng as SeedableRng>::Seed the same type on 32 and 64 bit platforms #1285

Closed
thomcc opened this issue Feb 11, 2023 · 16 comments · Fixed by #1368
Closed

CHANGE: Make <SmallRng as SeedableRng>::Seed the same type on 32 and 64 bit platforms #1285

thomcc opened this issue Feb 11, 2023 · 16 comments · Fixed by #1368

Comments

@thomcc
Copy link

thomcc commented Feb 11, 2023

Summary

Change SmallRng's implementation of SeedableRng so that the Seed is the same type and size on all targets (currently, <SmallRng as SeedableRng>::Seed is a [u8; 32] if cfg(target_pointer_width = "64") and a [u8; 16] if cfg(target_pointer_width = "32")).

This would be a breaking change, which will need to wait for the next semver-major release (currently that would be 0.9)

Details

There are a few ways this may be implemented that would satisfy me.

  1. Use the same underlying RNGs as are currently used, and increase the size of the seed used on 32 bit platforms to [u8; 32]. This is double the size the RNG actually needs, so for use it would need to be combined somehow (or have a portion discarded). There are many possible ways to do this, and the details do not seem particularly important.
  2. Use the same underlying RNGs as are currently used, and decrease the size of the seed used on 64 bit platforms to [u8; 16]. This is half the size the RNG actually needs, so it would need to be stretched into a [u8; 32] before use. As before, there are ways to do this,
  3. Switch the underlying RNG used under either (or both) target configurations to something that is consistent on both, or at least uses the same seed size/argument on both.

Of these I think 1 or 3 is the best, but do not feel particularly strongly.

The first two have the drawback that the seed would no longer corresponds directly to the state (or to the seed of the wrapped RNG). This seems to be allowed by the documentation of SeedableRng::from_seed already, and SmallRng being an abstraction probably gives you leeway to do things like it, but I suppose it is a downside.

The last one has a downside that it may be difficult to find algorithms that run well on 32 and 64 bit platforms (although I don't know that I believe this).

Motivation

rand::rngs::SmallRng's public API currently contains a portability hazard. Specifically, if you use SmallRng::from_seed, you need to pass a [u8; 32] on 64 bit targets and a [u8; 16] on 32 bit ones. Users who are writing portable code must notice this difference and either avoid the function or change argument passed into from_seed based on a #[cfg].

The fact that a different algorithm is used on different platforms is mentioned in the documentation, but it is not made clear that this impacts the API in a way other than the sequence of values output by the RNG. To me (and I could have misread the intent), the documentation seemed to indicate that the underlying RNG should considered an implementation detail (which is good, if I wanted a specific RNG implementation, I'd use that directly), implying I shouldn't have to worry about it (except beyond not expecting identical results across platforms).

I hit this as a compilation failure on 32 bit platforms in rust-lang/rust#104658 when updating the Rust standard library's version of rand used in benchmarks and tests, and found it quite surprising. I ended up switching to rand_xorshift to provide the RNG implementation, because it was less of a portability headache (even though I didn't care about the underlying algorithm).

I'd have preferred to continue using SmallRng (and had I noticed seed_from_u64 perhaps I'd have used that), but it still strikes me as pretty odd to have something like this in the public API on what's already intended to be a wrapper that abstracts away from the specific underlying algorithm — IOW, it feels like an abstraction leak.

Given how popular rand is, and the fact that most developer machines are 64 bit these days... I suspect there's some amount of rand users who have code which does not compile on 32 bit platforms as a result, which seems unfortunate.

Alternatives

Several alternatives for how to fix this are given under "Details", above. Those all seem pretty good to me, and relatively simple.

Aside from those, another option would be to overhaul the SeedableRng API to help avoid this sort of issue. That may help, but in general this kind of issue can crop up anywhere cfg is used unless care is taken to avoid it... so I don't know that it's an issue with SeedableRng in particular.

Finally, some alternatives which I don't find particularly compelling:

  1. Try to fix it with documentation.
  2. Do nothing. Users concerned with portability should use a different API or match the seed types with cfg.
  3. Come up with a way to move the error to runtime, but change nothing else. (I'm not sure what this one would look like specifically, but it seems disappointing)

P.S. Sorry if this has been discussed, I did a search and it did not come up.

@dhardy
Copy link
Member

dhardy commented Feb 12, 2023

Thanks for the detailed issue. I agree with the motivation. We are accepting breaking changes now (though 0.9 could still take a while to arrive).

I suggest option (1) and simply discarding excess bytes. Wasteful, but simple and straightforward (one should also not assume that somehow mashing extra bytes together improves the seeding quality in any way). The one problem with this approach would be if someone supplied a seed by casting converting a small integer into the seed, but I think documentation is clear that seed_from_u64 should be used instead for this purpose.

The first two have the drawback that the seed would no longer corresponds directly to the state (or to the seed of the wrapped RNG).

There is no guarantee that it does anyway, nor probably any benefit. For saving/reloading an RNG, serde is probably the only option.

@vks
Copy link
Collaborator

vks commented Feb 12, 2023

rand::rngs::SmallRng's public API currently contains a portability hazard.

Arguably, StdRng and SmallRng are portability hazards by design, because the algorithms may change. I think the docs also recommend using one of the rand_* crates if portability / reproducibility is required.

Note that option 1. and 2. still don't give you reproducibility, which makes me wonder what your use case is? You want determinism (so from_entropy is out), but no reproducibility across platforms. Why not use seed_from_u64 then?

Another option would be to accept &[u8] as a seed, but then we would need to generalize the logic in seed_from_u64 for all cases where the length is shorter than the seed required by the RNG.

@dhardy
Copy link
Member

dhardy commented Feb 13, 2023

@vks has a very good point here: we directly support four types of seeding: from_entropy, from_rng, seed_from_u64, rand_seeder::Seeder. All of these cases should be portable across architectures (though if you want to store the seed somewhere you'll need generics in your struct).

So why isn't this good enough?

@thomcc
Copy link
Author

thomcc commented Feb 14, 2023

Note that option 1. and 2. still don't give you reproducibility, which makes me wonder what your use case is? You want determinism (so from_entropy is out), but no reproducibility across platforms. Why not use seed_from_u64 then?

I don't need reproducibility for my case. I could have used seed_from_u64, but did not notice it. The fact that there's a portable function doesn't mean that the non-portable version isn't a footgun.

My use case was getting the lib{std,core,alloc} test suites to run on targets unsupported by getrandom (for a long time we were stuck at rand 0.7 due to this, since some of those targets are tier-2 and thus require that the test suites build), and had reached for a manually-seeded SmallRng. After hitting this snag, I ended up depending on rand_xorshift which we had already been using elsewhere. I'll admit that this use case is fairly odd.

However, I've also used SmallRng with hard-coded seeds in benchmarks before, to avoid variation between runs. Because benchmarks are meaningless to compare across systems anyway, the fact that it has a different algorithm on different systems does not matter. TBH, I think a lot of Rust code does not care about the actual values out by the RNG, even if they provide an explicit seed, but I suspect they care about portability a lot more.

Arguably, StdRng and SmallRng are portability hazards by design, because the algorithms may change.

The algorithm changing does not mean they're portability hazards. Plenty of applications do not need exact reproducibility, and it's very sensible to ask users to choose specific RNG algorithm if they care about exact reproducibility.

The way I'd think of this is like std::path::Path, which is a portable API even though it behaves differently on different OSes. Contrast that with std::os::windows which fails to compile on some OSes, and is non-portable. However, neither std::path::Path nor std::os::windows are what I'd call a portability hazard (unlike the current SmallRng::from_seed), since std::os::windows indicates in its name that it is non-portable (and even the set of targets it supports).

This is unlike SmallRng::from_seed, since there's no indication that SmallRng will have API differences on different platforms, and given that it's already an abstraction around a concrete algorithm, it feels like having the seed vary depend on that is a case where the abstraction has leaked.

Another option would be to accept &[u8] as a seed, but then we would need to generalize the logic in seed_from_u64 for all cases where the length is shorter than the seed required by the RNG.

This seems acceptable, I had considered accepting a &[u8], but not "compressing" overlong inputs (I had assumed this would lead to erroring on improperly-sized seed, which would be worse). This requires a bunch of changes to the SeedableRng API though, and the design of that would take its own thread and likely much more effort.

For cryptographic RNGs, if you do this though it would have to be done with a cryptographic hash algorithm to preserve the entropy, and I suspect even then they like the guarantee that they get the exactly correct amount of entropy).

@thomcc
Copy link
Author

thomcc commented Feb 14, 2023

All of these cases should be portable across architectures (though if you want to store the seed somewhere you'll need generics in your struct).

I think my answer is that it's extremely non-obvious that the from_seed API is not portable across architectures. For portable code (and in my experience portable code is the norm in Rust, not the exception), it would always be the wrong choice to use.

It is very easy for Rust code to use this, not realize it will cause their code to fail to compile on certain architectures, and cause problems later, perhaps for downstream users (ones who may not even need to use the method being invoked). I will note that while the lack of documentation exacerbates this, no amount of documentation will stop it.

It's not clear to me why to even have the SmallRng type if it's not going to abstract away from the underlying RNG anyway — preventing stuff like this seems like it would be the entire point of that type.

@TheIronBorn
Copy link
Contributor

The algorithm changing does not mean they're portability hazards

If the algorithm changes on a platform the seed type may change which could break some builds

@thomcc
Copy link
Author

thomcc commented Feb 15, 2023

Yes, which is what this issue is about... That's not a necessary property of changing the algorithm.

@dhardy
Copy link
Member

dhardy commented Feb 15, 2023

Interesting point.

@thomcc for your tests I recommend using seed_from_u64. This should produce good results even with a simple seed like 0; using from_seed([0; _]) may not (depending on the RNG).

Regarding the APIs then, I feel like perhaps we should consider from_seed an implementation detail and not expose this method on SmallRng or StdRng (though it should be exposed for named algorithms like ChaCha12Rng). This requires a change in traits:

pub trait FromSeed: Sized {
    type Seed: Sized + Default + AsMut<[u8]>;
    fn from_seed(seed: Self::Seed) -> Self;
}

pub trait SeedableRng {
    fn seed_from_u64(mut state: u64) -> Self;

    fn from_rng<R: RngCore>(mut rng: R) -> Result<Self, Error>;

    fn from_entropy() -> Self;
}

impl<R: FromSeed> SeedableRng for R {
    // rand_core provides this ...
}

// direct impl for SmallRng (which is a wrapper type, not a type-def, and does not impl FromSeed)
// (same for StdRng)
impl SeedableRng for SmallRng {
    // ...
}

@newpavlov
Copy link
Member

Maybe we should simply use Xoshiro128++ on all platforms? Yes, it will be less optimal on 64-bit platforms, but it will resolve all potential pitfalls regarding reproducibility and portability hazards. Maybe there are lightweight PRNGs which can be implemented without lose of efficiency on both 32 and 64 bit platforms?

@vks
Copy link
Collaborator

vks commented Feb 15, 2023

@dhardy I like your proposal! It breaks use cases where you need determinism and unpredictability, but those are arguably better served by using rand_chacha directly.

I think it's nice to discourage using from_seed, because it's too low-level for most use cases.

@dhardy
Copy link
Member

dhardy commented Feb 16, 2023

@newpavlov the main point is to let us replace the algorithm behind SmallRng in the future.

It breaks use cases where you need determinism

Assuming you mean reproducibility, it doesn't, because for that you need to specify the RNG directly anyway.

I'd almost like to drop the FromSeed trait and make from_seed an inherent function, but then RNG impls would have to implement SeedableRng directly, which isn't worth it.

@newpavlov
Copy link
Member

@dhardy
I think we need to decide clear goals for StdRng and SmallRng. Right now we reserve the right to change underlying algorithms even in patch versions (though seed sizes fall under API stability guarantees). According to the current docs the OP problem is a non-issue, since it's a clear misuse of SmallRng.

But I guess many users look at StdRng and SmallRng as "default" PRNGs (i.e. i-don't-care-about-details-PRNG), so they expect the standard PRNG properties: reproducibility and portability.

So I think the question is: should we modify goals of StdRng and SmallRng to satisfy the intuitive (mis)understanding of users or leave everything as-is?

@dhardy
Copy link
Member

dhardy commented Feb 18, 2023

@newpavlov that's another topic, but it's right there in the docs: "The algorithm is deterministic but should not be considered reproducible [..]". Motivating your argument with "But I guess [..]" isn't exactly persuasive.

So, no, I don't think we should modify the goals. Also, I think we shouldn't break reproducibility of anything deterministic in a patch release without strong motivation (e.g. a security issue).

And none of this has anything to do with this issue unless we commit Rand to never change the algorithms behind those RNGs which I don't want to do (we've changed both RNGs in the past and might find another performance bump or other reason to switch algorithms in the future).

@dhardy
Copy link
Member

dhardy commented Oct 31, 2023

If the refine RFC were part of the base language, this might not be a problem: essentially, associated types get #[refine] by default while things like RPITIT and safe-unsafe-trait-fns don't. But I'm not aware of any initiative to support non-refined associated types in Rust.

@dhardy
Copy link
Member

dhardy commented Nov 20, 2023

Regarding my idea above, it has a couple of problems:

  1. Some RNGs, e.g. those in rand_xoshiro, provide their own impls of fn seed_from_u64 instead of using the default version. We would probably need to redundantly declare this on both FromSeed (allowing override) and on SeedableRng (allowing usage for SmallRng and anywhere where only this trait is available).
  2. from_seed is legitimately a useful function for users in some cases, not just an implementation detail. So users sometimes need FromSeed and sometimes SeedableRng; this doesn't seem like a nice API.

My feeling is that we should simply not fix this (except with documentation). Acceptable?

@newpavlov
Copy link
Member

My feeling is that we should simply not fix this (except with documentation). Acceptable?

Yes. I think we should explicitly state that users should not rely on stability of SmallRng output across different targets and even versions of rand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants