Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SPEC7: Seeding pseudo-random number generation #180

Merged
merged 57 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
b233242
Add draft SPEC7: seeding pseudo-random number generation
stefanv Apr 19, 2023
1e0d3f9
Add deprecation strategy
stefanv Apr 21, 2023
009453f
Explain why globally seeded code will become unseeded
stefanv Apr 25, 2023
cfa4fd8
Integrate some reviewer feedback
stefanv Jun 4, 2024
9bf299a
Adjust based on https://github.com/scientific-python/specs/pull/180#d…
stefanv Jun 4, 2024
fb64c31
Fix typo
stefanv Jun 4, 2024
d55c0b2
Add code snippet to SPEC 7 (as proposed for SciPy)
seberg Jun 4, 2024
935fed2
Add a library function example
seberg Jun 4, 2024
a17eba9
Apply suggestions from code review
seberg Jun 6, 2024
0703ee8
Fix implementation note about random.seed()
seberg Jun 6, 2024
3fd8fc7
Merge pull request #1 from seberg/nep7-prng
stefanv Jul 2, 2024
7c163df
Add Pamphile as co-author
stefanv Jul 12, 2024
e2d6541
Add @lagru's keyword-only suggestion
stefanv Jul 12, 2024
9894af0
Appease linter
stefanv Jul 12, 2024
8d89cc8
Add @ilayn's suggestion to separate high-level goals and technical re…
stefanv Jul 12, 2024
151d68d
Add type annotation
stefanv Jul 12, 2024
5491549
How to transition away from np.random.seed
stefanv Jul 12, 2024
f4cf078
Using random_state kwd will eventually raise an error
stefanv Jul 12, 2024
9d60785
Clarify language describing decorator
stefanv Jul 12, 2024
7531111
Add @mdhaber's docstring to the example function
stefanv Jul 12, 2024
c196b74
Assorted tweaks and decorator code generalization
mdhaber Jul 13, 2024
415f173
Adjustments per review
mdhaber Jul 13, 2024
efc6428
Correct decorator
mdhaber Jul 15, 2024
83a1d92
Apply suggestions from code review
stefanv Jul 16, 2024
7a4258e
Pull code into external file
stefanv Jul 17, 2024
4679d40
Mention that type annotation describes additional types
stefanv Jul 17, 2024
14ab4a4
DOC: tweak
mdhaber Jul 17, 2024
07a2235
MAINT: add common message about different behavior of default_rng
mdhaber Jul 17, 2024
07744f2
Apply suggestions from code review
mdhaber Jul 17, 2024
be41cd6
Update spec-0007/transition_to_rng.py
mdhaber Jul 17, 2024
4b1c1c4
Merge pull request #2 from mdhaber/nep7-prng
stefanv Jul 17, 2024
a6e1caa
Fix unterminated string
stefanv Jul 17, 2024
38341b1
Add tests for _transition_to_rng
stefanv Jul 17, 2024
63a7bf5
Test for multiple specifications of rng/random_state
stefanv Jul 17, 2024
ef35bb9
General transition_to_rng docstring edits
stefanv Jul 17, 2024
4f8a441
Rename NEW_NAME to caps, as suggested by @mdhaber
stefanv Jul 17, 2024
e6cf422
Add tests for keyword-only decorator usage
stefanv Jul 17, 2024
a7d844f
Move Hinsen footnote closer to mention
stefanv Jul 17, 2024
83b28b6
Indent list in numerated list
stefanv Jul 17, 2024
cef9c51
Avoid usage of overloaded term "recommend"
stefanv Jul 17, 2024
f894a7a
More careful phrasing around Hinsen principle
stefanv Jul 17, 2024
044ff76
Apply @mdhaber's suggestions from code review
stefanv Jul 18, 2024
27934c7
MAINT: change optional dep_version to required end_version
mdhaber Jul 20, 2024
636bc5d
MAINT: some positional arguments should not warn
mdhaber Jul 20, 2024
bd32f7b
DOC: describe other cases in which a warning is emitted when arg is p…
mdhaber Jul 20, 2024
6ec3b6a
MAINT: specify when warnings should begin to be emitted; make decorat…
mdhaber Aug 7, 2024
4aa7051
Compute cmn_msg at import time to save work
mdhaber Aug 7, 2024
37ff883
Clarify extent of deprecations
mdhaber Aug 7, 2024
2dcf128
Move motivation section
mdhaber Aug 7, 2024
cc43293
Show example 'library_function' in different stages
mdhaber Aug 7, 2024
7b4c61b
Adjust tests
mdhaber Aug 7, 2024
ab3f448
Apply suggestions from code review
mdhaber Aug 26, 2024
0ae7919
Merge pull request #3 from mdhaber/nep7-prng
stefanv Aug 27, 2024
b7ddc1f
Improve motivation; general edits
stefanv Aug 27, 2024
81da885
Apply pre-commit changes
stefanv Aug 27, 2024
b106497
Merge remote-tracking branch 'origin/main' into nep7-prng
stefanv Aug 27, 2024
1180190
Edits based on @lagru's feedback
stefanv Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions spec-0007/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
title: "SPEC 7 — Seeding pseudo-random number generation"
number: 7
date: 2023-04-19
author:
- "Stéfan van der Walt <[email protected]>"
- "Sebastian Berg <[email protected]>"
stefanv marked this conversation as resolved.
Show resolved Hide resolved
- "Pamphile Roy <[email protected]>"
- "Matt Haberland <[email protected]>"
- Other participants in the discussion <[email protected]>"
tupui marked this conversation as resolved.
Show resolved Hide resolved
discussion: https://github.com/scipy/scipy/issues/14322
endorsed-by:
---

## Description

Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation.
This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors.
Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects.

We recommend:

- standardizing the usage and interpretation of an `rng` keyword for seeding, and
- avoiding the use of global state and legacy bitstream generators.

We suggest implementing these principles by:

- deprecating uses of an existing seed argument (commonly `random_state` or `seed`) in favor of a consistent `rng` argument,
- using `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`[^no-RandomState], and
- deprecating the use of `numpy.random.seed` to control the random state.

We are primarily concerned with API uniformity, but also encourage libraries to move towards using [NumPy pseudo-random `Generator`s](https://numpy.org/doc/stable/reference/random/generator.html) because:

1. `Generator`s avoid problems associated with naïve seeding (e.g., using successive integers), via its [SeedSequence](https://numpy.org/doc/stable/reference/random/parallel.html#seedsequence-spawning) mechanism;
2. their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios.

[^no-RandomState]:
Note that `numpy.random.default_rng` does not accept instances of `RandomState`, so use of `RandomState` to control the seed is effectively deprecated, too.
That said, neither `np.random.seed` nor `np.random.RandomState` _themselves_ are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data).

### Scope

This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator.
It is specifically targeted toward functions that currently accept `RandomState` instances via an argument other than `rng`, or allow `numpy.random.seed` to control the random state, but the ideas are more broadly applicable.
Random number generators other than those provided by NumPy could also be accommodated by an `rng` keyword, but that is beyond the scope of this SPEC.

### Concepts

- `BitGenerator`: Generates a stream of pseudo-random bits. The default generator in NumPy (`numpy.random.default_rng`) uses PCG64.
- `Generator`: Derives pseudo-random numbers from the bits produced by a `BitGenerator`.
- `RandomState`: a [legacy object in NumPy](https://numpy.org/doc/stable/reference/random/index.html), similar to `Generator`, that produces random numbers based on the Mersenne Twister.

### Constraints

NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways.
Common keyword arguments include `random_state` and `seed`.
In practice, the seed is also often controllable using `numpy.random.seed`.

## Implementation

Legacy behavior in packages such as scikit-learn (`sklearn.utils.check_random_state`) typically handle `None` (use the global seed state), an int (convert to `RandomState`), or `RandomState` object.

Our recommendation here is a deprecation strategy which does not in _all_ cases adhere to the Hinsen principle[^hinsen],
although it could very nearly do so by enforcing the use of `rng` as a keyword argument.

[^hinsen]: The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error.

The [deprecation strategy](https://github.com/scientific-python/specs/pull/180#issuecomment-1515248009) is as follows.

**Initially**, accept both `rng` and the existing `random_state`/`seed`/`...` keyword arguments.

- If both are specified by the user, raise an error.
- If `rng` is passed by keyword, validate it with `np.random.default_rng()` and use it to generate random numbers as needed.
- If `random_state`/`seed`/`...` is specified (by keyword or position, if allowed), preserve existing behavior.

**After `rng` becomes available** in all releases within the support window suggested by SPEC 0, emit warnings as follows:

- If neither `rng` nor `random_state`/`seed`/`...` is specified and `np.random.seed` has been used to set the seed, emit a `FutureWarning` about the upcoming change in behavior.
- If `random_state`/`seed`/`...` is passed by keyword or by position, treat it as before, but:

- Emit a `DeprecationWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Emit a `DeprecationWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`.
- Emit a `FutureWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`.

Not sure why using the deprecated keyword shouldn't also be a FutureWarning when using the deprecated position is. It's both aimed at end users (Python warning categories).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @mdhaber take that one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not familiar with that being the official distinction. This was written based on what I've seen in SciPy.

In our release notes, there is a standard note about users checking for DeprecationWarnings:

Before upgrading, we recommend that users check that their own code does not use deprecated SciPy functionality (to do so, run your code with python -Wd and check for DeprecationWarnings).

In SciPy, I have always seen a DeprecationWarning when something is going to raise an error in the future (i.e. in line with Hinsen principle) and a FutureWarning when something is going to change behavior but not raise an error (i.e. in violation of the Hinsen principle).

Often DeprecationWarnings are used for functions and keywords being removed. FutureWarnings are a lot less common. Here's an example of a decision made between the two.

In this case, a keyword is being removed in the sense that if a user passes random_state=10 they will get an error, so SciPy would probably emit a DeprecationWarning. If they pass 10 to random_state by position, on the other hand, they will not get an error in the future, but the behavior will change, so SciPy would probably emit a FutureWarning.

I suppose we can mention that projects are welcome to follow their conventions about which type of warning to emit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main important distinction here seem to be: by default, users won't see the DeprecationWarnings (without explicitly hunting for them with -Wd), whereas they will see FutureWarnings "out the box".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's usually why we default to FutureWarning in scikit-image. Though, I'm happy for this SPEC to recommend whatever. 😉

- Emit a `FutureWarning` if passed by position, warning about the change in behavior of the positional argument.

**After the deprecation period**, accept only `rng`, raising an error if `random_state`/`seed`/`...` is provided.

By now, the function signature, with type annotations, could look like this:

```python
from collections.abc import Sequence
import numpy as np


SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator


stefanv marked this conversation as resolved.
Show resolved Hide resolved
def my_func(*, rng: RNGLike | SeedLike | None = None):
"""My function summary.

Parameters
----------
rng : `numpy.random.Generator`, optional
Pseudorandom number generator state. When `rng` is None, a new
`numpy.random.Generator` is created using entropy from the
operating system. Types other than `numpy.random.Generator` are
passed to `numpy.random.default_rng` to instantiate a `Generator`.
"""
rng = np.random.default_rng(rng)

...

```

Also note the suggested language for the `rng` parameter docstring, which encourages the user to pass a `Generator` or `None`, but allows for other types accepted by `numpy.random.default_rng` (captured by the type annotation).

### Impact

There are three classes of users, which will be affected to varying degrees.

1. Those who do not attempt to control the random state.
Their code will switch from using the unseeded global `RandomState` to using an unseeded `Generator`.
Since the underlying _distributions_ of pseudo-random numbers will not change, these users should be largely unaffected.
While _technically_ this change does not adhere to the Hinsen principle, its impact should be minimal.

2. Users of `random_state`/`seed` arguments.
Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the new `rng` keyword.

3. Those who use `numpy.random.seed`.
The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded.
To ensure that this does not go unnoticed, libraries that allowed for control of the random state via `numpy.random.seed` should raise a `FutureWarning` if `np.random.seed` has been called. (See [Code](#code) below for an example.)
To fully adhere to the Hinsen principle, these warnings should instead be raised as errors.
In response, users will have to switch from using `numpy.random.seed` to passing the `rng` argument explicitly to all functions that accept it.

### Code

As an example, consider how a SciPy function would transition from a `random_state` parameter to an `rng` parameter using a decorator.

{{< include-code "transition_to_rng.py" "python" >}}

### Core Project Endorsement

Endorsement of this SPEC means that a project intends to:

- standardize the usage and interpretation of an `rng` keyword for seeding, and
- avoid the use of global state and legacy bitstream generators.

### Ecosystem Adoption

To adopt this SPEC, a project should:

- deprecate the use of `random_state`/`seed` arguments in favor of an `rng` argument in all functions where users need to control pseudo-random number generation,
- use `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`, and
- deprecate the use of `numpy.random.seed` to control the random state.

## Notes
128 changes: 128 additions & 0 deletions spec-0007/test_transition_to_rng.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
import contextlib

import numpy as np
import pytest

from transition_to_rng import _transition_to_rng

from scipy._lib._util import check_random_state


@_transition_to_rng("random_state", position_num=1, end_version="1.17.0")
def library_function(arg1, rng=None, arg2=0):
rng = check_random_state(rng)
return arg1, rng.random(), arg2


@contextlib.contextmanager
def np_random_seed(seed=0):
# Save RandomState
rs = np.random.mtrand._rand

# Install temporary RandomState
np.random.mtrand._rand = np.random.RandomState(seed)

yield

np.random.mtrand._rand = rs


def test_positional_random_state():
# doesn't need to warn
library_function(1, np.random.default_rng(2384924)) # Generators still accepted

message = "Positional use of"
if np.random.mtrand._rand._bit_generator._seed_seq is not None:
library_function(1, None) # seed not set
else:
with pytest.warns(FutureWarning, match=message):
library_function(1, None) # seed set

with pytest.warns(FutureWarning, match=message):
library_function(1, 1) # behavior will change

with pytest.warns(FutureWarning, match=message):
library_function(1, np.random.RandomState(1)) # will error

with pytest.warns(FutureWarning, match=message):
library_function(1, np.random) # will error


def test_random_state_deprecated():
message = "keyword argument `random_state` is deprecated"

with pytest.warns(DeprecationWarning, match=message):
library_function(1, random_state=None)

with pytest.warns(DeprecationWarning, match=message):
library_function(1, random_state=1)


def test_rng_correct_usage():
library_function(1, rng=None)

rng = np.random.default_rng(1)
ref_random = rng.random()

res = library_function(1, rng=1)
assert res == (1, ref_random, 0)

rng = np.random.default_rng(1)
res = library_function(1, rng, arg2=2)
assert res == (1, ref_random, 2)


def test_rng_incorrect_usage():
with pytest.raises(TypeError, match="SeedSequence expects"):
library_function(1, rng=np.random.RandomState(123))

with pytest.raises(TypeError, match="multiple values"):
library_function(1, rng=1, random_state=1)


def test_seeded_vs_unseeded():
with np_random_seed():
with pytest.warns(FutureWarning, match="NumPy global RNG"):
library_function(1)

# These tests should still pass when the global seed is set,
# since they provide explicit `random_state` or `rng`
test_positional_random_state()
test_random_state_deprecated()
test_rng_correct_usage()

# Entirely unseeded, should proceed without warning
library_function(1)


def test_decorator_no_positional():
@_transition_to_rng("random_state", end_version="1.17.0")
def library_function(arg1, *, rng=None, arg2=None):
rng = check_random_state(rng)
return arg1, rng.random(), arg2

message = "keyword argument `random_state` is deprecated"
with pytest.warns(DeprecationWarning, match=message):
library_function(1, random_state=3)

library_function(1, rng=123)


def test_decorator_no_end_version():
@_transition_to_rng("random_state")
def library_function(arg1, rng=None, *, arg2=None):
rng = check_random_state(rng)
return arg1, rng.random(), arg2

# no warnings emitted
library_function(1, rng=np.random.default_rng(235498235))
library_function(1, random_state=np.random.default_rng(235498235))
library_function(1, 235498235)
with np_random_seed():
library_function(1, None)


if __name__ == "__main__":
import pytest

pytest.main(["-W", "error"])
Loading