AD-LLM: Benchmarking Large Language Models for Anomaly Detection

Overview

AD-LLM introduces the first benchmark that evaluates how large language models (LLMs) can assist with natural language processing (NLP) tasks in anomaly detection (AD). We consider three key tasks:

Zero-shot Detection
Using LLMs' pre-trained knowledge to detect anomalies without task-specific training.
Data Augmentation
a. Synthetic Data Generation: Generating synthetic data to improve AD models.
b. Category Descriptions Generation: Creating category descriptions to enhance LLM-based AD.
Model Selection
Suggesting suitable unsupervised AD models through LLM-guided recommendations.

Our benchmark evaluates LLMs such as GPT-4 and Llama 3.1 across multiple datasets, providing a clear assessment of their capabilities in AD scenarios.

Citation

If you find this work useful, please cite our paper:

Paper Link: https://arxiv.org/abs/2412.11142

@article{yang2024ad,
  title={AD-LLM: Benchmarking Large Language Models for Anomaly Detection},
  author={Yang, Tiankai and Nian, Yi and Li, Shawn and Xu, Ruiyao and Li, Yuangang and Lin, Jiaqi and Xiao, Zhuo and Hu, Xiyang and Rossi, Ryan and Ding, Kaize and Hu, Xia and Zhao, Yue},
  journal={arXiv preprint arXiv:2412.11142},
  year={2024}
}

Environment Set-up

We use anaconda to create python environment and install required libraries:

# create the environment and activate it
conda create --name ad_llm python=3.11
conda activate ad_llm

# install basic packages
conda install numpy scipy scikit-learn matplotlib tqdm

# install PyTorch (adjust the CUDA version accordingly)
conda install pytorch torchvision pytorch-cuda=12.4 -c pytorch -c nvidia

# install PyOD
conda install -c conda-forge pyod

# install libraries for Llama
conda install -c conda-forge transformers
pip install --upgrade huggingface hub
pip install accelerate

# install libraries for GPT
pip install openai

Usage

1. Zero-shot Detection

Set _ad_setting=1 in config.py to run "Normal Only" setting; set _ad_setting=2 in config.py to run "Normal + Anomaly" setting.
To run experiments with Llama 3.1: python ad_llama.py.
To run experiments with GPT-4o: python ad_gpt.py.

2. Data Augmentation

1. Synthetic Data Generation

Change _num_keyword_groups_act in config.py to adjust the size of synthetic samples per category.
To generate synthetic data: python aug_synth_gpt.py.
To run experiments with synthetic data: python baseline_w_gpt_embed.py

2. Category Descriptions Generation

For Llama:
- To generate category description: python aug_desc_llama.py.
- To run experiments with category description, set _use_desc = True, then run python ad_llama.py.
For GPT:
- To generate category description: python aug_desc_gpt.py.
- To run experiments with category description, set _use_desc = True, then run python ad_gpt.py.
Remember to set _use_desc = False back when you do not wish to use category description.

3. Model Selection

To run experiments: python select_gpt.py.

Notes

We provide one example dataset "BBC News". Please check NLP-ADBench for more datasets (AG News, IMDB Reviews, N24 News, and SMS Spam) with the same setting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AD-LLM: Benchmarking Large Language Models for Anomaly Detection

Overview

Citation

Environment Set-up

Usage

1. Zero-shot Detection

2. Data Augmentation

1. Synthetic Data Generation

2. Category Descriptions Generation

3. Model Selection

Notes

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
figs		figs
prompt		prompt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ad_gpt.py		ad_gpt.py
ad_llama.py		ad_llama.py
aug_desc_gpt.py		aug_desc_gpt.py
aug_desc_llama.py		aug_desc_llama.py
aug_synth_gpt.py		aug_synth_gpt.py
baseline_w_gpt_embed.py		baseline_w_gpt_embed.py
config.py		config.py
select_gpt.py		select_gpt.py
utils.py		utils.py

License

USC-FORTIS/AD-LLM

Folders and files

Latest commit

History

Repository files navigation

AD-LLM: Benchmarking Large Language Models for Anomaly Detection

Overview

Citation

Environment Set-up

Usage

1. Zero-shot Detection

2. Data Augmentation

1. Synthetic Data Generation

2. Category Descriptions Generation

3. Model Selection

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages