-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add SPEC7: Seeding pseudo-random number generation (#180)
Under discussion at scipy/scipy#14322 --------- Co-authored-by: Sebastian Berg <[email protected]> Co-authored-by: Sebastian Berg <[email protected]> Co-authored-by: Pamphile Roy <[email protected]> Co-authored-by: Lars Grüter <[email protected]> Co-authored-by: Matt Haberland <[email protected]>
- Loading branch information
1 parent
cf09fa0
commit 15e7048
Showing
3 changed files
with
483 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
--- | ||
title: "SPEC 7 — Seeding pseudo-random number generation" | ||
number: 7 | ||
date: 2023-04-19 | ||
author: | ||
- "Stéfan van der Walt <[email protected]>" | ||
- "Sebastian Berg <[email protected]>" | ||
- "Pamphile Roy <[email protected]>" | ||
- "Matt Haberland <[email protected]>" | ||
- Other participants in the discussion <[email protected]>" | ||
discussion: https://github.com/scipy/scipy/issues/14322 | ||
endorsed-by: | ||
--- | ||
|
||
## Description | ||
|
||
Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation. | ||
This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors. | ||
Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects. | ||
|
||
We recommend: | ||
|
||
- standardizing the usage and interpretation of an `rng` keyword for seeding, and | ||
- avoiding the use of global state and legacy bitstream generators. | ||
|
||
We suggest implementing these principles by: | ||
|
||
- deprecating uses of an existing seed argument (commonly `random_state` or `seed`) in favor of a consistent `rng` argument, | ||
- using `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`[^no-RandomState], and | ||
- deprecating the use of `numpy.random.seed` to control the random state. | ||
|
||
We are primarily concerned with API uniformity, but also encourage libraries to move towards using [NumPy pseudo-random `Generator`s](https://numpy.org/doc/stable/reference/random/generator.html) because: | ||
|
||
1. `Generator`s avoid problems associated with naïve seeding (e.g., using successive integers), via its [SeedSequence](https://numpy.org/doc/stable/reference/random/parallel.html#seedsequence-spawning) mechanism; | ||
2. their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios. | ||
|
||
[^no-RandomState]: | ||
Note that `numpy.random.default_rng` does not accept instances of `RandomState`, so use of `RandomState` to control the seed is effectively deprecated, too. | ||
That said, neither `np.random.seed` nor `np.random.RandomState` _themselves_ are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data). | ||
|
||
### Scope | ||
|
||
This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator. | ||
It is specifically targeted toward functions that currently accept `RandomState` instances via an argument other than `rng`, or allow `numpy.random.seed` to control the random state, but the ideas are more broadly applicable. | ||
Random number generators other than those provided by NumPy could also be accommodated by an `rng` keyword, but that is beyond the scope of this SPEC. | ||
|
||
### Concepts | ||
|
||
- `BitGenerator`: Generates a stream of pseudo-random bits. The default generator in NumPy (`numpy.random.default_rng`) uses PCG64. | ||
- `Generator`: Derives pseudo-random numbers from the bits produced by a `BitGenerator`. | ||
- `RandomState`: a [legacy object in NumPy](https://numpy.org/doc/stable/reference/random/index.html), similar to `Generator`, that produces random numbers based on the Mersenne Twister. | ||
|
||
### Constraints | ||
|
||
NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways. | ||
Common keyword arguments include `random_state` and `seed`. | ||
In practice, the seed is also often controllable using `numpy.random.seed`. | ||
|
||
## Implementation | ||
|
||
Legacy behavior in packages such as scikit-learn (`sklearn.utils.check_random_state`) typically handle `None` (use the global seed state), an int (convert to `RandomState`), or `RandomState` object. | ||
|
||
Our recommendation here is a deprecation strategy which does not in _all_ cases adhere to the Hinsen principle[^hinsen], | ||
although it could very nearly do so by enforcing the use of `rng` as a keyword argument. | ||
|
||
[^hinsen]: The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error. | ||
|
||
The [deprecation strategy](https://github.com/scientific-python/specs/pull/180#issuecomment-1515248009) is as follows. | ||
|
||
**Initially**, accept both `rng` and the existing `random_state`/`seed`/`...` keyword arguments. | ||
|
||
- If both are specified by the user, raise an error. | ||
- If `rng` is passed by keyword, validate it with `np.random.default_rng()` and use it to generate random numbers as needed. | ||
- If `random_state`/`seed`/`...` is specified (by keyword or position, if allowed), preserve existing behavior. | ||
|
||
**After `rng` becomes available** in all releases within the support window suggested by SPEC 0, emit warnings as follows: | ||
|
||
- If neither `rng` nor `random_state`/`seed`/`...` is specified and `np.random.seed` has been used to set the seed, emit a `FutureWarning` about the upcoming change in behavior. | ||
- If `random_state`/`seed`/`...` is passed by keyword or by position, treat it as before, but: | ||
|
||
- Emit a `DeprecationWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`. | ||
- Emit a `FutureWarning` if passed by position, warning about the change in behavior of the positional argument. | ||
|
||
**After the deprecation period**, accept only `rng`, raising an error if `random_state`/`seed`/`...` is provided. | ||
|
||
By now, the function signature, with type annotations, could look like this: | ||
|
||
```python | ||
from collections.abc import Sequence | ||
import numpy as np | ||
|
||
|
||
SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence | ||
RNGLike = np.random.Generator | np.random.BitGenerator | ||
|
||
|
||
def my_func(*, rng: RNGLike | SeedLike | None = None): | ||
"""My function summary. | ||
Parameters | ||
---------- | ||
rng : `numpy.random.Generator`, optional | ||
Pseudorandom number generator state. When `rng` is None, a new | ||
`numpy.random.Generator` is created using entropy from the | ||
operating system. Types other than `numpy.random.Generator` are | ||
passed to `numpy.random.default_rng` to instantiate a `Generator`. | ||
""" | ||
rng = np.random.default_rng(rng) | ||
|
||
... | ||
|
||
``` | ||
|
||
Also note the suggested language for the `rng` parameter docstring, which encourages the user to pass a `Generator` or `None`, but allows for other types accepted by `numpy.random.default_rng` (captured by the type annotation). | ||
|
||
### Impact | ||
|
||
There are three classes of users, which will be affected to varying degrees. | ||
|
||
1. Those who do not attempt to control the random state. | ||
Their code will switch from using the unseeded global `RandomState` to using an unseeded `Generator`. | ||
Since the underlying _distributions_ of pseudo-random numbers will not change, these users should be largely unaffected. | ||
While _technically_ this change does not adhere to the Hinsen principle, its impact should be minimal. | ||
|
||
2. Users of `random_state`/`seed` arguments. | ||
Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the new `rng` keyword. | ||
|
||
3. Those who use `numpy.random.seed`. | ||
The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded. | ||
To ensure that this does not go unnoticed, libraries that allowed for control of the random state via `numpy.random.seed` should raise a `FutureWarning` if `np.random.seed` has been called. (See [Code](#code) below for an example.) | ||
To fully adhere to the Hinsen principle, these warnings should instead be raised as errors. | ||
In response, users will have to switch from using `numpy.random.seed` to passing the `rng` argument explicitly to all functions that accept it. | ||
|
||
### Code | ||
|
||
As an example, consider how a SciPy function would transition from a `random_state` parameter to an `rng` parameter using a decorator. | ||
|
||
{{< include-code "transition_to_rng.py" "python" >}} | ||
|
||
### Core Project Endorsement | ||
|
||
Endorsement of this SPEC means that a project intends to: | ||
|
||
- standardize the usage and interpretation of an `rng` keyword for seeding, and | ||
- avoid the use of global state and legacy bitstream generators. | ||
|
||
### Ecosystem Adoption | ||
|
||
To adopt this SPEC, a project should: | ||
|
||
- deprecate the use of `random_state`/`seed` arguments in favor of an `rng` argument in all functions where users need to control pseudo-random number generation, | ||
- use `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`, and | ||
- deprecate the use of `numpy.random.seed` to control the random state. | ||
|
||
## Notes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
import contextlib | ||
|
||
import numpy as np | ||
import pytest | ||
|
||
from transition_to_rng import _transition_to_rng | ||
|
||
from scipy._lib._util import check_random_state | ||
|
||
|
||
@_transition_to_rng("random_state", position_num=1, end_version="1.17.0") | ||
def library_function(arg1, rng=None, arg2=0): | ||
rng = check_random_state(rng) | ||
return arg1, rng.random(), arg2 | ||
|
||
|
||
@contextlib.contextmanager | ||
def np_random_seed(seed=0): | ||
# Save RandomState | ||
rs = np.random.mtrand._rand | ||
|
||
# Install temporary RandomState | ||
np.random.mtrand._rand = np.random.RandomState(seed) | ||
|
||
yield | ||
|
||
np.random.mtrand._rand = rs | ||
|
||
|
||
def test_positional_random_state(): | ||
# doesn't need to warn | ||
library_function(1, np.random.default_rng(2384924)) # Generators still accepted | ||
|
||
message = "Positional use of" | ||
if np.random.mtrand._rand._bit_generator._seed_seq is not None: | ||
library_function(1, None) # seed not set | ||
else: | ||
with pytest.warns(FutureWarning, match=message): | ||
library_function(1, None) # seed set | ||
|
||
with pytest.warns(FutureWarning, match=message): | ||
library_function(1, 1) # behavior will change | ||
|
||
with pytest.warns(FutureWarning, match=message): | ||
library_function(1, np.random.RandomState(1)) # will error | ||
|
||
with pytest.warns(FutureWarning, match=message): | ||
library_function(1, np.random) # will error | ||
|
||
|
||
def test_random_state_deprecated(): | ||
message = "keyword argument `random_state` is deprecated" | ||
|
||
with pytest.warns(DeprecationWarning, match=message): | ||
library_function(1, random_state=None) | ||
|
||
with pytest.warns(DeprecationWarning, match=message): | ||
library_function(1, random_state=1) | ||
|
||
|
||
def test_rng_correct_usage(): | ||
library_function(1, rng=None) | ||
|
||
rng = np.random.default_rng(1) | ||
ref_random = rng.random() | ||
|
||
res = library_function(1, rng=1) | ||
assert res == (1, ref_random, 0) | ||
|
||
rng = np.random.default_rng(1) | ||
res = library_function(1, rng, arg2=2) | ||
assert res == (1, ref_random, 2) | ||
|
||
|
||
def test_rng_incorrect_usage(): | ||
with pytest.raises(TypeError, match="SeedSequence expects"): | ||
library_function(1, rng=np.random.RandomState(123)) | ||
|
||
with pytest.raises(TypeError, match="multiple values"): | ||
library_function(1, rng=1, random_state=1) | ||
|
||
|
||
def test_seeded_vs_unseeded(): | ||
with np_random_seed(): | ||
with pytest.warns(FutureWarning, match="NumPy global RNG"): | ||
library_function(1) | ||
|
||
# These tests should still pass when the global seed is set, | ||
# since they provide explicit `random_state` or `rng` | ||
test_positional_random_state() | ||
test_random_state_deprecated() | ||
test_rng_correct_usage() | ||
|
||
# Entirely unseeded, should proceed without warning | ||
library_function(1) | ||
|
||
|
||
def test_decorator_no_positional(): | ||
@_transition_to_rng("random_state", end_version="1.17.0") | ||
def library_function(arg1, *, rng=None, arg2=None): | ||
rng = check_random_state(rng) | ||
return arg1, rng.random(), arg2 | ||
|
||
message = "keyword argument `random_state` is deprecated" | ||
with pytest.warns(DeprecationWarning, match=message): | ||
library_function(1, random_state=3) | ||
|
||
library_function(1, rng=123) | ||
|
||
|
||
def test_decorator_no_end_version(): | ||
@_transition_to_rng("random_state") | ||
def library_function(arg1, rng=None, *, arg2=None): | ||
rng = check_random_state(rng) | ||
return arg1, rng.random(), arg2 | ||
|
||
# no warnings emitted | ||
library_function(1, rng=np.random.default_rng(235498235)) | ||
library_function(1, random_state=np.random.default_rng(235498235)) | ||
library_function(1, 235498235) | ||
with np_random_seed(): | ||
library_function(1, None) | ||
|
||
|
||
if __name__ == "__main__": | ||
import pytest | ||
|
||
pytest.main(["-W", "error"]) |
Oops, something went wrong.