-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config driven detectors - part 4 #497
Config driven detectors - part 4 #497
Conversation
…nc. deterministic tests.
This reverts commit bd0fa00.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks for all the helpful comments @jklaise! I've had a first pass at them now. |
P.s. @jklaise, these sections in the docs methods pages need updating: I've added a TODO in the PR description to do this before merging into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM given the discussion from the previous review.
alibi_detect/cd/classifier.py
Outdated
@@ -142,6 +143,9 @@ def __init__( | |||
""" | |||
super().__init__() | |||
|
|||
# Get args/kwargs to set config later | |||
inputs = locals().copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this duplicate the memory during init even if it's later gc'd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point... looking into this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK you were totally correct. The above implementation would not avoid x_ref
duplication as intended. We couldn't remove .copy()
as this leads to backend
etc in the config being altered by the later backend-specific operations. I've now moved set_config
to the top of the __input__
for all detectors with a backend, which also brings them into parity with the non-backend detectors.
This required a minor change to how we get version
for the config in _set_config
, since self.meta
is no longer set when _set_config
is called:
'meta': {
'version': self.meta['version'],
'config_spec': __config_spec__,
}
changed to:
'meta': {
'version': __version__,
'config_spec': __config_spec__,
}
Don't think this is as an issue because we import __version__
in alibi_detect.base
anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.s. IMO the above complexity is a symptom of the way we set detector backends i.e. self._detector = MMDDriftTF(...)
. It is slightly odd how MMDDrift
is not a detector itself, but only a container with a detector as an attribute.
I've been discussing with @mauicv how we might want to refactor this, and will add some ideas to #512.
for k, v in cfg.items(): | ||
if isinstance(v, str): | ||
v = prepend_dir.joinpath(Path(v)) | ||
if v.is_file() or v.is_dir(): # Update if prepending config_dir made config value a real filepath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you iterate through recursively and if it's a string you prepend the directory and see if it exists in the file system? I'm trying to think if there's some way this might backfire?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly that. Then only update the string in the config (with the new prepended file path) if it is a file or dir.
The two ways I can think of which it might backfire are:
- If a config contains a field which a
str
but shouldn't be a filepath (e.g. a regsitry str), whose string is such that when prepending the config filepath to it the new string is actually a filepath. This field would then incorrectly be updated. I can't think of a realistic example of where this would happen, but it is theoretically possible. - The bigger disadvantage is that if a filepath has been spec'd in the config but it is missing/incorrect, it won't be updated with the prepended path. The
FileNotFoundError
error raised later would then be confusing if it prints out the problem directory. The easiest thing to do is probs just go through and check noFileNotFoundError
's print out the given directory...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on this, I've actually decided not to remove the filepath included in FileNotFoundError
's such as that in saving.tensorflow._loading.load_model
:
raise FileNotFoundError(f'No .h5 file found in {model_dir}.')
Although model_dir
wouldn't have the config file dir prepended to it (see 2. above), the filepath would still match that in config.toml
, which might actually be a nice feature to aid the user's debugging...
@mauicv shout if you disagree, otherwise think we're good to go!
if cfg is not None: | ||
cfg = cfg() # TODO - can just do detector.get_config() once all detectors have a .get_config() | ||
if hasattr(detector, 'get_config'): | ||
cfg = detector.get_config() # type: ignore[union-attr] # TODO - remove once all detectors have get_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should accumulate all these todo's in an issue somewhere once this is merged so that when stuff gets fixed we know what to update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This is the fourth (and final!) part of a series of PR's for the config-driven detector functionality. The original PR (#389) has been split into a number of smaller PR's to aid the review process.
Summary of PR
This PR implements a number of final fixes and refactorings:
preprocess_at_init
andpreprocess_at_pred
logic implemented in Preprocess poc #381 and Config driven detectors - part 1 #458 has been reworked. This turned out to have a problem in how it dealt withupdate_x_ref
, since regardless ofx_ref_preprocessed
, we still need to update reference data within.predict()
whenupdate_x_ref
is set. All offline drift detectors have been reworked to use the old logic (but withpreprocess_x_ref
renamedpreprocess_at_init
), with the addition thatself.x_ref_preprocessed
is also checked internally.get_config
methods involved a lot of boilerplate to try to recover the original args/kwargs from detector attributes. The new approach calls a generic_set_config()
with__init__
, and thenself.config
is returned byget_config
. This should significantly reduce the workload to add save/load to new detectors. To avoid memory overheads, large artefacts such asx_ref
are not set at__init__
, and instead are added withinget_config
.get_config
approach, save/load has been added for the model uncertainty and online detectors!preprocess_fn
's were previously resolved in_load_detector_config
, which wasn't consistent with how other artefacts were resolved (it also caused added extra challenges). These are now resolved inresolve_config
instead. Following this theKernelConfigResolved
andPreprocessConfigResolved
pydantic models have been removed (they could be added back but it would complicateresolve_config
).test_saving.py
. This uncovered bugs with how kernels were saved and loaded. These have been fixed.class NDArray(Generic[T], np.ndarray[Any, T])
inutils._typing
causes an error withautodoc
on older Python versions.Future TODO's
feature/config_driven_detectors
). The types aren't currently visible for model fields, because fields are actually class attributes andsphinx_autodoc_typehints
doesn't extract types from attributes. Possible fixes include documenting each attribute in the docstring (but then fields will be documented twice unless we turn off:undoc-members:
, or usingautodoc-pydantic
(this might require Install alibi-detect during RTD docs build #499)._set_config()
for backend vs non-backend detectors etc) to the github wiki.LSDDDrift
normalization logic: Config driven detectors - part 4 #497 (comment)