Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libast: provide rand_r fallback #806

Closed
McDutchie opened this issue Dec 21, 2024 · 5 comments
Closed

libast: provide rand_r fallback #806

McDutchie opened this issue Dec 21, 2024 · 5 comments
Labels
TODO Things to be done before releasing

Comments

@McDutchie
Copy link

ksh depends on the operating system's rand_r function for reproducible seeded $RANDOM sequences. But rand_r has been removed from the 2024 version of the POSIX standard, so we cannot rely on systems continuing to provide it. We should add a fallback implementation to libast for use on systems that don't come with it.

@McDutchie McDutchie added the TODO Things to be done before releasing label Dec 21, 2024
@dnewhall
Copy link

dnewhall commented Jan 6, 2025

I took a cursory look at this.

The most portable fix here is to replace rand_r(&rp->rand_seed) in init.c with just rand().

rand_r returns the same value range as rand and was supposed to be the thread-safe replacement for it, but it never really was (you have to do additional steps to ensure the seed value is thread-safe). There are some C linters that say to replace rand with rand_r, but that was never good advice and now it's deprecated. GNU provides nrand48_r, which is the closest thing that could be used as a modern replacement, but it isn't portable.

If thread safety is what's desired, then it needs to be completely implemented from scratch because there is no way to do thread-safe pseudo-random numbers in POSIX. If compatibility with the existing behavior is what's desired, then rand will do that because we are already calling srand to seed it in put_rand.

An additional oddity with the code is that nget_rand (and by extension get_rand) explicitly refuses to return the same number in a row. That, by definition, makes it non-random (well, less random, in this case).
(Fun fact: the Allies during WW2 were finally able to break the Enigma machine encryption in part by realizing it never returned the same value twice, thus giving the "random" values a pattern)

@dnewhall
Copy link

dnewhall commented Jan 6, 2025

If we decide to do a more thorough reimplementation, this is worth reading:
https://www.evanjones.ca/random-thread-safe.html

@McDutchie
Copy link
Author

McDutchie commented Jan 6, 2025

Thanks for looking into this. A few points:

Thread-safety is irrelevant because this project does not use threads, and never will. Parallelism is implemented by forking or spawning processes.

$RANDOM should not be confused with randomness. It is designed to give the sort of pseudorandomness that is useful for simple games, but nothing more. In fact, it is deliberately reproducible. After seeding it by assigning a known value to it (e.g. RANDOM=123), it will give the same sequence of pseudorandom numbers each time.

rand_r is used instead of rand because this is necessary to keep reproducible seeded $RANDOM sequences working in the shell, particularly in combination with virtual subshells, which require us to save and restore the intermediate seed value. Only rand_r will give us access to the intermediate seed value so we can save it. This is why we are going to have to provide a complete fallback implementation of rand_r.

It's an ancient design decision that $RANDOM never returns the same value twice in a row. This is not going to change at this point. Given that it's not actually random, this is not a problem.

(For real randomness, the dev branch, i.e. future 93u+m/1.1, has added $SRANDOM; see 00b296c, 77ae650, 4c868d6.)

@dnewhall
Copy link

dnewhall commented Jan 7, 2025

Oh, OK. That makes sense.

I've coded up a proof-of-concept replacing rand_r with nrand48 in init.c that appears to work. But should we provide a fallback in case the host implementation doesn't support nrand48? The title mentions adding it to libast; where would such a function go in that case?

@McDutchie
Copy link
Author

Using nrand48 is an interesting idea. I looked it up in the POSIX standard. Turns out it's part of a family of functions that has been in POSIX since Issue 1, and is not deprecated, so that seems safe to just use, without providing a fallback. (If a fallback does turn out to be needed, there was one for UWIN in the file src/lib/libast/uwin/rand48.c that was removed in 0ea97b9, so we could re-use that.)

In the meantime I've been working on a rand_r fallback and some related tweaks. The rand_r implementation is so trivial (all of 15 lines) that I think it's not worth doing this in libast. I'll just incorporate it straight into ksh93/sh/init.c. I will apply that first to the dev branch and the 1.0 branch, closing this issue.

After that I would like to see a PR with your nrand48 proof of concept, if you like (which would also remove the rand_r fallback again as rand_r would be unused). We can test that on the dev branch and it can become part of the future 93u+m/1.1 release.

McDutchie added a commit that referenced this issue Jan 7, 2025
As of the referenced commit, the rand_r(3) function is used for
$RANDOM instead of rand(3). This is necessary to keep repeatable
seeded $RANDOM sequences working in the shell -- particularly in
combination with virtual subshells, which require us to save and
restore the intermediate seed value. Only rand_r will allow us to
access and control that seed value so we can save it.

However, POSIX removed rand_r in the 2024 edition, so we cannot
count on operating systems continuing to provide it forever.

src/lib/libast/features/{lib,sys}:
- Add lib test for rand_r, checking if it exists in OS libraries.
  Defines _lib_rand_r as 1 if found.
- Add extern test for rand_r, checking if the extern declaration is
  found in the system headers and emitting one if not. This is
  needed because some operating systems (e.g., NetBSD) keep
  deprecated functions in their libraries for ABI compatibility
  purposes while hiding or removing their header declarations.

src/cmd/ksh93/sh/{init,subshell}.c:
- Add rand_r fallback implementation (nicked from FreeBSD libc) if
  !_lib_rand_r (i.e., rand_r is not found in system libraries).
- Since this is a static function (no need for extern here as it's
  only used in init.c), use a #define to rename it to _ksh_rand_r
  to avoid a conflict with the extern declaration.
- Eliminate pointless srand(3) calls. It seeds rand(3) sequences;
  it's irrelevant to rand_r(3) because we give it a pointer to our
  own seed variable! All we need to do is assign to that.
- Eliminate an init-time loop that calls rand(3) to check whether
  it returns large values. This is not inefficient, and now invalid
  in case we provide our own rand_r whose range may differ from the
  OS's rand. Determine this at compile time instead, using the
  RAND_MAX macro (which we redefine when providing our own rand_r).

Resolves: #806
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO Things to be done before releasing
Projects
None yet
Development

No branches or pull requests

2 participants