Skip to content

Commit

Permalink
Synthetic anomaly for testing and validation (#634)
Browse files Browse the repository at this point in the history
* move sample generation to datamodule instead of dataset

* move sample generation from init to setup

* remove inference stage and add base classes

* replace dataset classes with AnomalibDataset

* move setup to base class, create samples as class method

* update docstrings

* refactor btech to new format

* allow training with no anomalous data

* remove MVTec name from comment

* raise NotImplementedError in base class

* allow both png and bmp images for btech

* use label_index to check if dataset contains anomalous images

* refactor getitem in dataset class

* use iloc for indexing

* move dataloader getters to base class

* refactor to add validate stage in setup

* implement alternative datamodules solution

* small improvements

* improve design

* remove unused constructor arguments

* adapt btech to new design

* add prepare_data method for mvtec

* implement more generic random splitting function

* update docstrings for folder module

* ensure type consistency when performing operations on dataset

* change imports

* change variable names

* replace pass with NotImplementedError

* allow training on folder without test images

* use relative path for normal_test_dir

* fix dataset tests

* update validation set parameter in configs

* change default argument

* use setter for samples

* hint options for val_split_mode

* update assert message and docstring

* revert name change dataset vs datamodule

* typing and docstrings

* remove samples argument from dataset constructor

* val/test -> eval

* remove Split.Full from enum

* sort samples when setting

* update warn message

* formatting

* use setter when creating samples in dataset classes

* add tests for new dataset class

* add test case for label aware random split

* update parameter name in inferencers

* move _setup implementation to base class

* address codacy issues

* fix pylint issues

* codacy

* update example dataset config in docs

* fix test

* move base classes to separate files (avoid circular import)

* add synthetic dataset class

* move augmenter to data directory

* add base classes

* update docstring

* use synthetic dataset in base datamodule

* fix imports

* clean up synthetic anomaly dataset implementation

* fix mistake in augmenter

* change default split ratio

* remove accidentally added file

* validation_split_mode -> val_split_mode

* update docs

* Update anomalib/data/base/dataset.py

Co-authored-by: Joao P C Bertoldo <[email protected]>

* get length from self.samples

* assert unique indices

* check is_setup for individual datasets

Co-authored-by: Joao P C Bertoldo <[email protected]>

* remove assert in __getitem_\

Co-authored-by: Joao P C Bertoldo <[email protected]>

* Update anomalib/data/btech.py

Co-authored-by: Joao P C Bertoldo <[email protected]>

* clearer assert message

* clarify list inversion in comment

* comments and typing

* validate contents of samples dataframe before setting

* add file paths check

* add seed to random_split function

* fix expected columns

* fix typo

* add seed parameter to datamodules

* set global seed in test entrypoint

* add NONE option to valsplitmode

* clarify setup behaviour in docstring

* add logging message

* use val_split_ratio for synthetic validation set

* pathlib

* make synthetic anomaly available for test set

* update configs

* add tests

* simplify test set splitting logic

* update docstring

* add missing licence

* split_normal_and_anomalous -> split_by_label

* VideoAnomalib -> AnomalibVideo

Co-authored-by: Joao P C Bertoldo <[email protected]>
  • Loading branch information
djdameln and jpcbertoldo authored Dec 14, 2022
1 parent d210feb commit 67462e7
Show file tree
Hide file tree
Showing 27 changed files with 560 additions and 79 deletions.
7 changes: 6 additions & 1 deletion anomalib/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ def get_datamodule(config: Union[DictConfig, ListConfig]) -> AnomalibDataModule:
task=config.dataset.task,
transform_config_train=config.dataset.transform_config.train,
transform_config_eval=config.dataset.transform_config.eval,
test_split_mode=config.dataset.test_split_mode,
test_split_ratio=config.dataset.test_split_ratio,
val_split_mode=config.dataset.val_split_mode,
val_split_ratio=config.dataset.val_split_ratio,
)
Expand All @@ -58,6 +60,8 @@ def get_datamodule(config: Union[DictConfig, ListConfig]) -> AnomalibDataModule:
task=config.dataset.task,
transform_config_train=config.dataset.transform_config.train,
transform_config_eval=config.dataset.transform_config.eval,
test_split_mode=config.dataset.test_split_mode,
test_split_ratio=config.dataset.test_split_ratio,
val_split_mode=config.dataset.val_split_mode,
val_split_ratio=config.dataset.val_split_ratio,
)
Expand All @@ -70,13 +74,14 @@ def get_datamodule(config: Union[DictConfig, ListConfig]) -> AnomalibDataModule:
normal_test_dir=config.dataset.normal_test_dir,
mask_dir=config.dataset.mask_dir,
extensions=config.dataset.extensions,
normal_split_ratio=config.dataset.normal_split_ratio,
image_size=(config.dataset.image_size[0], config.dataset.image_size[1]),
train_batch_size=config.dataset.train_batch_size,
eval_batch_size=config.dataset.eval_batch_size,
num_workers=config.dataset.num_workers,
transform_config_train=config.dataset.transform_config.train,
transform_config_eval=config.dataset.transform_config.eval,
test_split_mode=config.dataset.test_split_mode,
test_split_ratio=config.dataset.test_split_ratio,
val_split_mode=config.dataset.val_split_mode,
val_split_ratio=config.dataset.val_split_ratio,
)
Expand Down
8 changes: 5 additions & 3 deletions anomalib/data/avenue.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from pandas import DataFrame
from torch import Tensor

from anomalib.data.base import AnomalibDataModule, VideoAnomalibDataset
from anomalib.data.base import AnomalibVideoDataModule, AnomalibVideoDataset
from anomalib.data.task_type import TaskType
from anomalib.data.utils import DownloadProgressBar, Split, ValSplitMode, hash_check
from anomalib.data.utils.video import ClipsIndexer
Expand Down Expand Up @@ -121,7 +121,7 @@ def get_mask(self, idx) -> Optional[Tensor]:
return masks


class AvenueDataset(VideoAnomalibDataset):
class AvenueDataset(AnomalibVideoDataset):
"""Avenue Dataset class.
Args:
Expand Down Expand Up @@ -156,7 +156,7 @@ def _setup(self):
self.samples = make_avenue_dataset(self.root, self.gt_dir, self.split)


class Avenue(AnomalibDataModule):
class Avenue(AnomalibVideoDataModule):
"""Avenue DataModule class.
Args:
Expand All @@ -177,6 +177,8 @@ class Avenue(AnomalibDataModule):
during validation.
Defaults to None.
val_split_mode (ValSplitMode): Setting that determines how the validation subset is obtained.
val_split_ratio (float): Fraction of train or test images that will be reserved for validation.
seed (Optional[int], optional): Seed which may be set to a fixed value for reproducibility.
"""

def __init__(
Expand Down
4 changes: 2 additions & 2 deletions anomalib/data/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@

from .datamodule import AnomalibDataModule
from .dataset import AnomalibDataset
from .video import VideoAnomalibDataset
from .video import AnomalibVideoDataModule, AnomalibVideoDataset

__all__ = ["AnomalibDataset", "AnomalibDataModule", "VideoAnomalibDataset"]
__all__ = ["AnomalibDataset", "AnomalibDataModule", "AnomalibVideoDataset", "AnomalibVideoDataModule"]
44 changes: 43 additions & 1 deletion anomalib/data/base/datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,13 @@
from torch.utils.data import DataLoader, default_collate

from anomalib.data.base.dataset import AnomalibDataset
from anomalib.data.utils import ValSplitMode, random_split
from anomalib.data.synthetic import SyntheticAnomalyDataset
from anomalib.data.utils import (
TestSplitMode,
ValSplitMode,
random_split,
split_by_label,
)

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -60,12 +66,16 @@ def __init__(
num_workers: int,
val_split_mode: ValSplitMode,
val_split_ratio: float,
test_split_mode: Optional[TestSplitMode] = None,
test_split_ratio: Optional[float] = None,
seed: Optional[int] = None,
):
super().__init__()
self.train_batch_size = train_batch_size
self.eval_batch_size = eval_batch_size
self.num_workers = num_workers
self.test_split_mode = test_split_mode
self.test_split_ratio = test_split_ratio
self.val_split_mode = val_split_mode
self.val_split_ratio = val_split_ratio
self.seed = seed
Expand Down Expand Up @@ -101,12 +111,44 @@ def _setup(self, _stage: Optional[str] = None) -> None:

self.train_data.setup()
self.test_data.setup()

self._create_test_split()
self._create_val_split()

def _create_test_split(self):
"""Obtain the test set based on the settings in the config."""
if self.test_data.has_normal:
# split the test data into normal and anomalous so these can be processed separately
normal_test_data, self.test_data = split_by_label(self.test_data)
else:
# when the user did not provide any normal images for testing, we sample some from the training set
logger.info(
"No normal test images found. Sampling from training set using a split ratio of %d",
self.test_split_ratio,
)
self.train_data, normal_test_data = random_split(self.train_data, self.test_split_ratio)

if self.test_split_mode == TestSplitMode.FROM_DIR:
self.test_data += normal_test_data
elif self.test_split_mode == TestSplitMode.SYNTHETIC:
self.test_data = SyntheticAnomalyDataset.from_dataset(normal_test_data)
else:
raise ValueError(f"Unsupported Test Split Mode: {self.test_split_mode}")

def _create_val_split(self):
"""Obtain the validation set based on the settings in the config."""
if self.val_split_mode == ValSplitMode.FROM_TEST:
# randomly sampled from test set
self.test_data, self.val_data = random_split(
self.test_data, self.val_split_ratio, label_aware=True, seed=self.seed
)
elif self.val_split_mode == ValSplitMode.SAME_AS_TEST:
# equal to test set
self.val_data = self.test_data
elif self.val_split_mode == ValSplitMode.SYNTHETIC:
# converted from random training sample
self.train_data, normal_val_data = random_split(self.train_data, self.val_split_ratio)
self.val_data = SyntheticAnomalyDataset.from_dataset(normal_val_data)
elif self.val_split_mode != ValSplitMode.NONE:
raise ValueError(f"Unknown validation split mode: {self.val_split_mode}")

Expand Down
29 changes: 26 additions & 3 deletions anomalib/data/base/video.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@
import torch
from torch import Tensor

from anomalib.data.base.datamodule import AnomalibDataModule
from anomalib.data.base.dataset import AnomalibDataset
from anomalib.data.task_type import TaskType
from anomalib.data.utils import masks_to_boxes
from anomalib.data.utils import ValSplitMode, masks_to_boxes
from anomalib.data.utils.video import ClipsIndexer
from anomalib.pre_processing import PreProcessor


class VideoAnomalibDataset(AnomalibDataset, ABC):
class AnomalibVideoDataset(AnomalibDataset, ABC):
"""Base video anomalib dataset class.
Args:
Expand Down Expand Up @@ -48,7 +49,7 @@ def samples(self):
@samples.setter
def samples(self, samples):
"""Overwrite samples and re-index subvideos."""
super(VideoAnomalibDataset, self.__class__).samples.fset(self, samples)
super(AnomalibVideoDataset, self.__class__).samples.fset(self, samples)
self._setup_clips()

def _setup_clips(self) -> None:
Expand Down Expand Up @@ -93,3 +94,25 @@ def __getitem__(self, index: int) -> Dict[str, Union[str, Tensor]]:
item.pop("mask")

return item


class AnomalibVideoDataModule(AnomalibDataModule):
"""Base class for video data modules."""

def _setup(self, _stage: Optional[str] = None) -> None:
"""Set up the datasets and perform dynamic subset splitting.
This method may be overridden in subclass for custom splitting behaviour.
Video datamodules are not compatible with synthetic anomaly generation.
"""
assert self.train_data is not None
assert self.test_data is not None

self.train_data.setup()
self.test_data.setup()

if self.val_split_mode == ValSplitMode.SYNTHETIC:
raise ValueError(f"Val split mode {self.test_split_mode} not supported for video datasets.")

self._create_val_split()
17 changes: 16 additions & 1 deletion anomalib/data/btech.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,13 @@

from anomalib.data.base import AnomalibDataModule, AnomalibDataset
from anomalib.data.task_type import TaskType
from anomalib.data.utils import DownloadProgressBar, Split, ValSplitMode, hash_check
from anomalib.data.utils import (
DownloadProgressBar,
Split,
TestSplitMode,
ValSplitMode,
hash_check,
)
from anomalib.pre_processing import PreProcessor

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -181,6 +187,8 @@ def __init__(
task: TaskType = TaskType.SEGMENTATION,
transform_config_train: Optional[Union[str, A.Compose]] = None,
transform_config_eval: Optional[Union[str, A.Compose]] = None,
test_split_mode: TestSplitMode = TestSplitMode.FROM_DIR,
test_split_ratio: float = 0.2,
val_split_mode: ValSplitMode = ValSplitMode.SAME_AS_TEST,
val_split_ratio: float = 0.5,
seed: Optional[int] = None,
Expand All @@ -199,6 +207,11 @@ def __init__(
transform_config_val: Config for pre-processing during validation.
create_validation_set: Create a validation subset in addition to the train and test subsets
seed (Optional[int], optional): Seed used during random subset splitting.
test_split_mode (TestSplitMode): Setting that determines how the testing subset is obtained.
test_split_ratio (float): Fraction of images from the train set that will be reserved for testing.
val_split_mode (ValSplitMode): Setting that determines how the validation subset is obtained.
val_split_ratio (float): Fraction of train or test images that will be reserved for validation.
seed (Optional[int], optional): Seed which may be set to a fixed value for reproducibility.
Examples:
>>> from anomalib.data import BTech
Expand Down Expand Up @@ -230,6 +243,8 @@ def __init__(
train_batch_size=train_batch_size,
eval_batch_size=eval_batch_size,
num_workers=num_workers,
test_split_mode=test_split_mode,
test_split_ratio=test_split_ratio,
val_split_mode=val_split_mode,
val_split_ratio=val_split_ratio,
seed=seed,
Expand Down
24 changes: 8 additions & 16 deletions anomalib/data/folder.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

from anomalib.data.base import AnomalibDataModule, AnomalibDataset
from anomalib.data.task_type import TaskType
from anomalib.data.utils import Split, ValSplitMode, random_split
from anomalib.data.utils import Split, TestSplitMode, ValSplitMode
from anomalib.pre_processing.pre_process import PreProcessor


Expand Down Expand Up @@ -270,7 +270,10 @@ class Folder(AnomalibDataModule):
transform_config_val (Optional[Union[str, A.Compose]], optional): Config for pre-processing
during validation.
Defaults to None.
test_split_mode (TestSplitMode): Setting that determines how the testing subset is obtained.
test_split_ratio (float): Fraction of images from the train set that will be reserved for testing.
val_split_mode (ValSplitMode): Setting that determines how the validation subset is obtained.
val_split_ratio (float): Fraction of train or test images that will be reserved for validation.
seed (Optional[int], optional): Seed used during random subset splitting.
"""

Expand All @@ -291,6 +294,8 @@ def __init__(
task: TaskType = TaskType.SEGMENTATION,
transform_config_train: Optional[Union[str, A.Compose]] = None,
transform_config_eval: Optional[Union[str, A.Compose]] = None,
test_split_mode: TestSplitMode = TestSplitMode.FROM_DIR,
test_split_ratio: float = 0.2,
val_split_mode: ValSplitMode = ValSplitMode.FROM_TEST,
val_split_ratio: float = 0.5,
seed: Optional[int] = None,
Expand All @@ -299,6 +304,8 @@ def __init__(
train_batch_size=train_batch_size,
eval_batch_size=eval_batch_size,
num_workers=num_workers,
test_split_mode=test_split_mode,
test_split_ratio=test_split_ratio,
val_split_mode=val_split_mode,
val_split_ratio=val_split_ratio,
seed=seed,
Expand Down Expand Up @@ -332,18 +339,3 @@ def __init__(
mask_dir=mask_dir,
extensions=extensions,
)

def _setup(self, _stage: Optional[str] = None):
"""Set up the datasets for the Folder Data Module."""
assert self.train_data is not None
assert self.test_data is not None

self.train_data.setup()
self.test_data.setup()

# add some normal images to the test set
if not self.test_data.has_normal:
self.train_data, normal_test_data = random_split(self.train_data, self.normal_split_ratio, seed=self.seed)
self.test_data += normal_test_data

super()._setup()
36 changes: 34 additions & 2 deletions anomalib/data/mvtec.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,13 @@

from anomalib.data.base import AnomalibDataModule, AnomalibDataset
from anomalib.data.task_type import TaskType
from anomalib.data.utils import DownloadProgressBar, Split, ValSplitMode, hash_check
from anomalib.data.utils import (
DownloadProgressBar,
Split,
TestSplitMode,
ValSplitMode,
hash_check,
)
from anomalib.pre_processing import PreProcessor

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -149,7 +155,29 @@ def _setup(self):


class MVTec(AnomalibDataModule):
"""MVTec Datamodule."""
"""MVTec Datamodule.
Args:
root (str): Path to the root of the dataset
category (str): Category of the MVTec dataset (e.g. "bottle" or "cable").
image_size (Optional[Union[int, Tuple[int, int]]], optional): Size of the input image.
Defaults to None.
train_batch_size (int, optional): Training batch size. Defaults to 32.
eval_batch_size (int, optional): Test batch size. Defaults to 32.
num_workers (int, optional): Number of workers. Defaults to 8.
task TaskType): Task type, 'classification', 'detection' or 'segmentation'
transform_config_train (Optional[Union[str, A.Compose]], optional): Config for pre-processing
during training.
Defaults to None.
transform_config_val (Optional[Union[str, A.Compose]], optional): Config for pre-processing
during validation.
Defaults to None.
test_split_mode (TestSplitMode): Setting that determines how the testing subset is obtained.
test_split_ratio (float): Fraction of images from the train set that will be reserved for testing.
val_split_mode (ValSplitMode): Setting that determines how the validation subset is obtained.
val_split_ratio (float): Fraction of train or test images that will be reserved for validation.
seed (Optional[int], optional): Seed which may be set to a fixed value for reproducibility.
"""

def __init__(
self,
Expand All @@ -162,6 +190,8 @@ def __init__(
task: TaskType = TaskType.SEGMENTATION,
transform_config_train: Optional[Union[str, A.Compose]] = None,
transform_config_eval: Optional[Union[str, A.Compose]] = None,
test_split_mode: TestSplitMode = TestSplitMode.FROM_DIR,
test_split_ratio: float = 0.2,
val_split_mode: ValSplitMode = ValSplitMode.SAME_AS_TEST,
val_split_ratio: float = 0.5,
seed: Optional[int] = None,
Expand All @@ -170,6 +200,8 @@ def __init__(
train_batch_size=train_batch_size,
eval_batch_size=eval_batch_size,
num_workers=num_workers,
test_split_mode=test_split_mode,
test_split_ratio=test_split_ratio,
val_split_mode=val_split_mode,
val_split_ratio=val_split_ratio,
seed=seed,
Expand Down
Loading

0 comments on commit 67462e7

Please sign in to comment.