Skip to content

Commit

Permalink
Add SPEC7: Seeding pseudo-random number generation (#180)
Browse files Browse the repository at this point in the history
Under discussion at scipy/scipy#14322

---------

Co-authored-by: Sebastian Berg <[email protected]>
Co-authored-by: Sebastian Berg <[email protected]>
Co-authored-by: Pamphile Roy <[email protected]>
Co-authored-by: Lars Grüter <[email protected]>
Co-authored-by: Matt Haberland <[email protected]>
  • Loading branch information
6 people authored Aug 29, 2024
1 parent cf09fa0 commit 15e7048
Show file tree
Hide file tree
Showing 3 changed files with 483 additions and 0 deletions.
155 changes: 155 additions & 0 deletions spec-0007/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
title: "SPEC 7 — Seeding pseudo-random number generation"
number: 7
date: 2023-04-19
author:
- "Stéfan van der Walt <[email protected]>"
- "Sebastian Berg <[email protected]>"
- "Pamphile Roy <[email protected]>"
- "Matt Haberland <[email protected]>"
- Other participants in the discussion <[email protected]>"
discussion: https://github.com/scipy/scipy/issues/14322
endorsed-by:
---

## Description

Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation.
This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors.
Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects.

We recommend:

- standardizing the usage and interpretation of an `rng` keyword for seeding, and
- avoiding the use of global state and legacy bitstream generators.

We suggest implementing these principles by:

- deprecating uses of an existing seed argument (commonly `random_state` or `seed`) in favor of a consistent `rng` argument,
- using `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`[^no-RandomState], and
- deprecating the use of `numpy.random.seed` to control the random state.

We are primarily concerned with API uniformity, but also encourage libraries to move towards using [NumPy pseudo-random `Generator`s](https://numpy.org/doc/stable/reference/random/generator.html) because:

1. `Generator`s avoid problems associated with naïve seeding (e.g., using successive integers), via its [SeedSequence](https://numpy.org/doc/stable/reference/random/parallel.html#seedsequence-spawning) mechanism;
2. their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios.

[^no-RandomState]:
Note that `numpy.random.default_rng` does not accept instances of `RandomState`, so use of `RandomState` to control the seed is effectively deprecated, too.
That said, neither `np.random.seed` nor `np.random.RandomState` _themselves_ are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data).

### Scope

This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator.
It is specifically targeted toward functions that currently accept `RandomState` instances via an argument other than `rng`, or allow `numpy.random.seed` to control the random state, but the ideas are more broadly applicable.
Random number generators other than those provided by NumPy could also be accommodated by an `rng` keyword, but that is beyond the scope of this SPEC.

### Concepts

- `BitGenerator`: Generates a stream of pseudo-random bits. The default generator in NumPy (`numpy.random.default_rng`) uses PCG64.
- `Generator`: Derives pseudo-random numbers from the bits produced by a `BitGenerator`.
- `RandomState`: a [legacy object in NumPy](https://numpy.org/doc/stable/reference/random/index.html), similar to `Generator`, that produces random numbers based on the Mersenne Twister.

### Constraints

NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways.
Common keyword arguments include `random_state` and `seed`.
In practice, the seed is also often controllable using `numpy.random.seed`.

## Implementation

Legacy behavior in packages such as scikit-learn (`sklearn.utils.check_random_state`) typically handle `None` (use the global seed state), an int (convert to `RandomState`), or `RandomState` object.

Our recommendation here is a deprecation strategy which does not in _all_ cases adhere to the Hinsen principle[^hinsen],
although it could very nearly do so by enforcing the use of `rng` as a keyword argument.

[^hinsen]: The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error.

The [deprecation strategy](https://github.com/scientific-python/specs/pull/180#issuecomment-1515248009) is as follows.

**Initially**, accept both `rng` and the existing `random_state`/`seed`/`...` keyword arguments.

- If both are specified by the user, raise an error.
- If `rng` is passed by keyword, validate it with `np.random.default_rng()` and use it to generate random numbers as needed.
- If `random_state`/`seed`/`...` is specified (by keyword or position, if allowed), preserve existing behavior.

**After `rng` becomes available** in all releases within the support window suggested by SPEC 0, emit warnings as follows:

- If neither `rng` nor `random_state`/`seed`/`...` is specified and `np.random.seed` has been used to set the seed, emit a `FutureWarning` about the upcoming change in behavior.
- If `random_state`/`seed`/`...` is passed by keyword or by position, treat it as before, but:

- Emit a `DeprecationWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`.
- Emit a `FutureWarning` if passed by position, warning about the change in behavior of the positional argument.

**After the deprecation period**, accept only `rng`, raising an error if `random_state`/`seed`/`...` is provided.

By now, the function signature, with type annotations, could look like this:

```python
from collections.abc import Sequence
import numpy as np


SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator


def my_func(*, rng: RNGLike | SeedLike | None = None):
"""My function summary.
Parameters
----------
rng : `numpy.random.Generator`, optional
Pseudorandom number generator state. When `rng` is None, a new
`numpy.random.Generator` is created using entropy from the
operating system. Types other than `numpy.random.Generator` are
passed to `numpy.random.default_rng` to instantiate a `Generator`.
"""
rng = np.random.default_rng(rng)

...

```

Also note the suggested language for the `rng` parameter docstring, which encourages the user to pass a `Generator` or `None`, but allows for other types accepted by `numpy.random.default_rng` (captured by the type annotation).

### Impact

There are three classes of users, which will be affected to varying degrees.

1. Those who do not attempt to control the random state.
Their code will switch from using the unseeded global `RandomState` to using an unseeded `Generator`.
Since the underlying _distributions_ of pseudo-random numbers will not change, these users should be largely unaffected.
While _technically_ this change does not adhere to the Hinsen principle, its impact should be minimal.

2. Users of `random_state`/`seed` arguments.
Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the new `rng` keyword.

3. Those who use `numpy.random.seed`.
The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded.
To ensure that this does not go unnoticed, libraries that allowed for control of the random state via `numpy.random.seed` should raise a `FutureWarning` if `np.random.seed` has been called. (See [Code](#code) below for an example.)
To fully adhere to the Hinsen principle, these warnings should instead be raised as errors.
In response, users will have to switch from using `numpy.random.seed` to passing the `rng` argument explicitly to all functions that accept it.

### Code

As an example, consider how a SciPy function would transition from a `random_state` parameter to an `rng` parameter using a decorator.

{{< include-code "transition_to_rng.py" "python" >}}

### Core Project Endorsement

Endorsement of this SPEC means that a project intends to:

- standardize the usage and interpretation of an `rng` keyword for seeding, and
- avoid the use of global state and legacy bitstream generators.

### Ecosystem Adoption

To adopt this SPEC, a project should:

- deprecate the use of `random_state`/`seed` arguments in favor of an `rng` argument in all functions where users need to control pseudo-random number generation,
- use `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`, and
- deprecate the use of `numpy.random.seed` to control the random state.

## Notes
128 changes: 128 additions & 0 deletions spec-0007/test_transition_to_rng.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
import contextlib

import numpy as np
import pytest

from transition_to_rng import _transition_to_rng

from scipy._lib._util import check_random_state


@_transition_to_rng("random_state", position_num=1, end_version="1.17.0")
def library_function(arg1, rng=None, arg2=0):
rng = check_random_state(rng)
return arg1, rng.random(), arg2


@contextlib.contextmanager
def np_random_seed(seed=0):
# Save RandomState
rs = np.random.mtrand._rand

# Install temporary RandomState
np.random.mtrand._rand = np.random.RandomState(seed)

yield

np.random.mtrand._rand = rs


def test_positional_random_state():
# doesn't need to warn
library_function(1, np.random.default_rng(2384924)) # Generators still accepted

message = "Positional use of"
if np.random.mtrand._rand._bit_generator._seed_seq is not None:
library_function(1, None) # seed not set
else:
with pytest.warns(FutureWarning, match=message):
library_function(1, None) # seed set

with pytest.warns(FutureWarning, match=message):
library_function(1, 1) # behavior will change

with pytest.warns(FutureWarning, match=message):
library_function(1, np.random.RandomState(1)) # will error

with pytest.warns(FutureWarning, match=message):
library_function(1, np.random) # will error


def test_random_state_deprecated():
message = "keyword argument `random_state` is deprecated"

with pytest.warns(DeprecationWarning, match=message):
library_function(1, random_state=None)

with pytest.warns(DeprecationWarning, match=message):
library_function(1, random_state=1)


def test_rng_correct_usage():
library_function(1, rng=None)

rng = np.random.default_rng(1)
ref_random = rng.random()

res = library_function(1, rng=1)
assert res == (1, ref_random, 0)

rng = np.random.default_rng(1)
res = library_function(1, rng, arg2=2)
assert res == (1, ref_random, 2)


def test_rng_incorrect_usage():
with pytest.raises(TypeError, match="SeedSequence expects"):
library_function(1, rng=np.random.RandomState(123))

with pytest.raises(TypeError, match="multiple values"):
library_function(1, rng=1, random_state=1)


def test_seeded_vs_unseeded():
with np_random_seed():
with pytest.warns(FutureWarning, match="NumPy global RNG"):
library_function(1)

# These tests should still pass when the global seed is set,
# since they provide explicit `random_state` or `rng`
test_positional_random_state()
test_random_state_deprecated()
test_rng_correct_usage()

# Entirely unseeded, should proceed without warning
library_function(1)


def test_decorator_no_positional():
@_transition_to_rng("random_state", end_version="1.17.0")
def library_function(arg1, *, rng=None, arg2=None):
rng = check_random_state(rng)
return arg1, rng.random(), arg2

message = "keyword argument `random_state` is deprecated"
with pytest.warns(DeprecationWarning, match=message):
library_function(1, random_state=3)

library_function(1, rng=123)


def test_decorator_no_end_version():
@_transition_to_rng("random_state")
def library_function(arg1, rng=None, *, arg2=None):
rng = check_random_state(rng)
return arg1, rng.random(), arg2

# no warnings emitted
library_function(1, rng=np.random.default_rng(235498235))
library_function(1, random_state=np.random.default_rng(235498235))
library_function(1, 235498235)
with np_random_seed():
library_function(1, None)


if __name__ == "__main__":
import pytest

pytest.main(["-W", "error"])
Loading

0 comments on commit 15e7048

Please sign in to comment.