Config driven detectors - part 4 #497

ascillitoe · 2022-05-05T12:19:40Z

This is the fourth (and final!) part of a series of PR's for the config-driven detector functionality. The original PR (#389) has been split into a number of smaller PR's to aid the review process.

Summary of PR

This PR implements a number of final fixes and refactorings:

The preprocess_at_init and preprocess_at_pred logic implemented in Preprocess poc #381 and Config driven detectors - part 1 #458 has been reworked. This turned out to have a problem in how it dealt with update_x_ref, since regardless of x_ref_preprocessed, we still need to update reference data within .predict() when update_x_ref is set. All offline drift detectors have been reworked to use the old logic (but with preprocess_x_ref renamed preprocess_at_init), with the addition that self.x_ref_preprocessed is also checked internally.
The previous get_config methods involved a lot of boilerplate to try to recover the original args/kwargs from detector attributes. The new approach calls a generic _set_config() with __init__, and then self.config is returned by get_config. This should significantly reduce the workload to add save/load to new detectors. To avoid memory overheads, large artefacts such as x_ref are not set at __init__, and instead are added within get_config.
Owing to the ease of implementation with the new get_config approach, save/load has been added for the model uncertainty and online detectors!
Kernels and preprocess_fn's were previously resolved in _load_detector_config, which wasn't consistent with how other artefacts were resolved (it also caused added extra challenges). These are now resolved in resolve_config instead. Following this the KernelConfigResolved and PreprocessConfigResolved pydantic models have been removed (they could be added back but it would complicate resolve_config).
Fixing determinism in Random seed utilities #496 has allowed us to compare original and loaded detector predictions in test_saving.py. This uncovered bugs with how kernels were saved and loaded. These have been fixed.
The readthedocs.yml has been fully updated to the V2 schema so that we can use Python 3.9 for building the docs. This is required as the class NDArray(Generic[T], np.ndarray[Any, T]) in utils._typing causes an error with autodoc on older Python versions.

Future TODO's

The pydantic schema API docs need follow-up work (to be done in a separate PR to feature/config_driven_detectors). The types aren't currently visible for model fields, because fields are actually class attributes and sphinx_autodoc_typehints doesn't extract types from attributes. Possible fixes include documenting each attribute in the docstring (but then fields will be documented twice unless we turn off :undoc-members:, or using autodoc-pydantic (this might require Install alibi-detect during RTD docs build #499).
Add a dev guide on creating new detectors (i.e. how to add _set_config() for backend vs non-backend detectors etc) to the github wiki.
Update the "Saving and Loading" sections of docs methods pages.
Collate all future config related TODO's into an issue (or issues). There are a number to be collected from WIP: Config driven detectors #389 and Config driven detectors - part 3 #469, . Those outstanding from this PR are:
- Tidy LSDDDrift normalization logic: Config driven detectors - part 4 #497 (comment)

…nc. deterministic tests.

…logic fixed

This reverts commit bd0fa00.

review-notebook-app · 2022-05-06T08:29:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ascillitoe · 2022-05-23T14:55:51Z

Thanks for all the helpful comments @jklaise! I've had a first pass at them now.

ascillitoe · 2022-05-23T15:11:12Z

P.s. @jklaise, these sections in the docs methods pages need updating:

I've added a TODO in the PR description to do this before merging into master, as don't want to add to the file diff in this PR.

jklaise

LGTM given the discussion from the previous review.

mauicv · 2022-05-24T14:43:49Z

alibi_detect/cd/classifier.py

@@ -142,6 +143,9 @@ def __init__(
        """
        super().__init__()

+        # Get args/kwargs to set config later
+        inputs = locals().copy()


will this duplicate the memory during init even if it's later gc'd?

You have a point... looking into this...

OK you were totally correct. The above implementation would not avoid x_ref duplication as intended. We couldn't remove .copy() as this leads to backend etc in the config being altered by the later backend-specific operations. I've now moved set_config to the top of the __input__ for all detectors with a backend, which also brings them into parity with the non-backend detectors.

This required a minor change to how we get version for the config in _set_config, since self.meta is no longer set when _set_config is called:

'meta': { 'version': self.meta['version'], 'config_spec': __config_spec__, }

changed to:

'meta': { 'version': __version__, 'config_spec': __config_spec__, }

Don't think this is as an issue because we import __version__ in alibi_detect.base anyway...

P.s. IMO the above complexity is a symptom of the way we set detector backends i.e. self._detector = MMDDriftTF(...). It is slightly odd how MMDDrift is not a detector itself, but only a container with a detector as an attribute.

I've been discussing with @mauicv how we might want to refactor this, and will add some ideas to #512.

mauicv · 2022-05-24T16:51:29Z

alibi_detect/saving/loading.py

+    for k, v in cfg.items():
+        if isinstance(v, str):
+            v = prepend_dir.joinpath(Path(v))
+            if v.is_file() or v.is_dir():  # Update if prepending config_dir made config value a real filepath


Do you iterate through recursively and if it's a string you prepend the directory and see if it exists in the file system? I'm trying to think if there's some way this might backfire?

Yes exactly that. Then only update the string in the config (with the new prepended file path) if it is a file or dir.

The two ways I can think of which it might backfire are:

If a config contains a field which a str but shouldn't be a filepath (e.g. a regsitry str), whose string is such that when prepending the config filepath to it the new string is actually a filepath. This field would then incorrectly be updated. I can't think of a realistic example of where this would happen, but it is theoretically possible.

The bigger disadvantage is that if a filepath has been spec'd in the config but it is missing/incorrect, it won't be updated with the prepended path. The FileNotFoundError error raised later would then be confusing if it prints out the problem directory. The easiest thing to do is probs just go through and check no FileNotFoundError's print out the given directory...

Following up on this, I've actually decided not to remove the filepath included in FileNotFoundError's such as that in saving.tensorflow._loading.load_model:

raise FileNotFoundError(f'No .h5 file found in {model_dir}.')

Although model_dir wouldn't have the config file dir prepended to it (see 2. above), the filepath would still match that in config.toml, which might actually be a nice feature to aid the user's debugging...

@mauicv shout if you disagree, otherwise think we're good to go!

mauicv · 2022-05-24T16:56:39Z

alibi_detect/saving/saving.py

-    if cfg is not None:
-        cfg = cfg()  # TODO - can just do detector.get_config() once all detectors have a .get_config()
+    if hasattr(detector, 'get_config'):
+        cfg = detector.get_config()  # type: ignore[union-attr]  # TODO - remove once all detectors have get_config


We should accumulate all these todo's in an issue somewhere once this is merged so that when stuff gets fixed we know what to update

alibi_detect/base.py

alibi_detect/cd/context_aware.py

alibi_detect/cd/learned_kernel.py

alibi_detect/cd/lsdd.py

alibi_detect/cd/mmd.py

alibi_detect/cd/mmd_online.py

alibi_detect/cd/spot_the_diff.py

alibi_detect/cd/tabular.py

mauicv

LGTM!

ascillitoe added 29 commits April 28, 2022 13:49

WIP test

00b27b2

WIP: tidied loading.py. In progress of reworking kernel save/load

d921fe9

standard kernel save/load refactored

9c1df84

Fix some flake8 and mypy issues

c07ff17

Kernel save/load now working. MMDDrift and ContextMMDDrift updated, i…

c8d0a58

…nc. deterministic tests.

Fixed LSDD x_ref normalization

a0d6d66

Prototype of LSDDDrift save/load with enable_config approach

58254e2

Fixed LSDD test and utils.random

d742c6a

Revert name of set_seeds to reseed

6a996cd

Fix mypy issues with numpy>=1.22.0 in utils._types

55a6b73

Implemented RNG seed utilities

17e3934

Reworked cvm and fet online tests slightly

478517e

Tweaked offline fet and cvm tests, and moved utils.random to private

48f4404

Included missing private submodule

7509148

Renamed imports to _random

ca025a0

Revert test changes and fix seed for now

2694f04

Fixes to pytest-randomly setup

83bc057

Add small docstring

1ed62d1

Merge feature/random_utils.

41853aa

Revert broken addition in schemas

73d126e

WIP: Updated get_config method for most detectors

a66e1fe

set_config() at init, except for large artefacts. preprocess_at_init …

769b522

…logic fixed

LearnedKernel save/load fixed

c9d1c55

Fixed flake8 and mypy errors

3d4bf32

All saving tests now passing

2171a61

Save/load for online detectors

7fd12c4

Revert "Replace autodoc with autoapi (#482)"

6c65075

This reverts commit bd0fa00.

ModelUncertainty save/load

b2a1716

Fix to missing line

279f780

ascillitoe added 7 commits May 23, 2022 12:59

Remove erroneous print and improve test_save_lsdd test

fcfa4bb

Replace 'is False' with 'not'. Write out S2C in full

a0609b7

Test moving as_posix to _path2str

bc1cbb6

Added 'Unresolved' to relevent schema docstrings. Fix path2str test

151581b

Revert erroneous changes to background.md links

c0f8a30

Tweak to base set_config. Add note re POSIX filepaths

d9d6c34

Remove backend name stripping

721079f

Update kwargs in docs

98bc604

jklaise approved these changes May 24, 2022

View reviewed changes

This was referenced May 24, 2022

Backend-specific detectors inherit incorrect name in meta #514

Open

Check that backend-dependent kwarg's are properly documented #515

Open

Add note re LARGE_ARTEFACTS

cab8084

mauicv reviewed May 24, 2022

View reviewed changes

Move set_config in detectors with backend

ab3562f

mauicv reviewed May 24, 2022

View reviewed changes