unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) enabling uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.
pip install unquad
Mind the optional dependencies for, e.g., using deep learning models (see pyproject.toml).
Conformal Anomaly Detection applies the principles of conformal inference (conformal prediction) to anomaly detection. Conformal Anomaly Detection focuses on controlling error metrics like the false discovery rate, while maintaining statistical power.
CAD converts anomaly scores to p-values by comparing anomaly scores of test data against anomaly scores of calibration data as part of the training data (normal instances). The resulting p-value of the test score(s) is computed as the normalized rank among the calibration scores. These statistically valid p-values enable error control through methods like Benjamini-Hochberg, replacing traditional anomaly estimates that lack statistical guarantees.
Using the default behavior of ConformalDetector()
with default DetectorConfig()
.
from pyod.models.gmm import GMM
from unquad.strategy.split import Split
from unquad.estimation.conformal import ConformalDetector
from unquad.data.load import load_shuttle
from unquad.utils.metrics import false_discovery_rate, statistical_power
x_train, x_test, y_test = load_shuttle(setup=True)
ce = ConformalDetector(
detector=GMM(),
strategy=Split(calib_size=1_000)
)
ce.fit(x_train)
estimates = ce.predict(x_test)
print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
Output:
Empirical FDR: 0.108
Empirical Power: 0.892
The behavior can be customized by changing the DetectorConfig()
:
@dataclass
class DetectorConfig:
alpha: float = 0.2 # Nominal FDR value
adjustment: Adjustment = Adjustment.BH # Multiple testing procedure
aggregation: Aggregation = Aggregation.MEDIAN # Score aggregation (if applicable)
seed: int = 1
silent: bool = True
The BootstrapConformal()
strategy allows to set 2 of the 3 parameters resampling_ratio
, n_boostraps
and n_calib
.
For either combination, the remaining parameter will be filled automatically. This allows exact control of the
calibration procedure when using a bootstrap strategy.
from pyod.models.iforest import IForest
from unquad.estimation.properties.configuration import DetectorConfig
from unquad.estimation.conformal import ConformalDetector
from unquad.strategy.bootstrap import Bootstrap
from unquad.utils.enums import Aggregation, Adjustment
from unquad.data.load import load_shuttle
from unquad.utils.metrics import false_discovery_rate, statistical_power
x_train, x_test, y_test = load_shuttle(setup=True)
ce = ConformalDetector(
detector=IForest(behaviour="new"),
strategy=Bootstrap(resampling_ratio=0.99, n_bootstraps=20, plus=True),
config=DetectorConfig(alpha=0.1, adjustment=Adjustment.BH, aggregation=Aggregation.MEAN),
)
ce.fit(x_train)
estimates = ce.predict(x_test)
print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
Output:
Empirical FDR: 0.067
Empirical Power: 0.933
The package only supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are internally set to the smallest possible values.
Models that are currently supported include:
- Angle-Based Outlier Detection (ABOD)
- Autoencoder (AE)
- Cook's Distance (CD)
- Copula-based Outlier Detector (COPOD)
- Deep Isolation Forest (DIF)
- Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
- Gaussian Mixture Model (GMM)
- Histogram-based Outlier Detection (HBOS)
- Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
- Isolation Forest (IForest)
- Kernel Density Estimation (KDE)
- k-Nearest Neighbor (kNN)
- Kernel Principal Component Analysis (KPCA)
- Linear Model Deviation-base Outlier Detection (LMDD)
- Local Outlier Factor (LOF)
- Local Correlation Integral (LOCI)
- Lightweight Online Detector of Anomalies (LODA)
- Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
- GNN-based Anomaly Detection Method (LUNAR)
- Median Absolute Deviation (MAD)
- Minimum Covariance Determinant (MCD)
- One-Class SVM (OCSVM)
- Principal Component Analysis (PCA)
- Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
- Rotation-based Outlier Detection (ROD)
- Subspace Outlier Detection (SOD)
- Scalable Unsupervised Outlier Detection (SUOD)
Bug reporting: https://github.com/OliverHennhoefer/unquad/issues