A PyTorch implementation of ACRNet based on ICME 2023 paper Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network.
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install openmim
mim install mmaction2 -f https://github.com/open-mmlab/mmaction2.git
THUMOS 14 and ActivityNet
datasets are used in this repo, you should download these datasets from official websites. The RGB and Flow features of
these datasets are extracted by dataset.py with 25 FPS
. You should follow
this link to install OpenCV4 with CUDA. And then
compile denseFlow_GPU, put the executable program in this dir. The options
could be found in dataset.py, this script will take a lot of time to extract the features. Finally, I3D
features of these datasets are extracted by this repo,
the extract_features.py
file should be replaced with extract.py, the options could be found in
extract.py. To make this research friendly, we uploaded these I3D features in
MEGA. You could download them from there, and make sure the
data directory structure is organized as follows:
├── thumos14 | ├── activitynet
├── features | ├── features
├── val | ├── training
├── video_validation_0000051_flow.npy | ├── v___c8enCfzqw_flow.npy
├── video_validation_0000051_rgb.npy | ├── v___c8enCfzqw_rgb.npy
└── ... | └── ...
├── test | ├── validation
├── video_test_0000004_flow.npy | ├── v__1vYKA7mNLI_flow.npy
├── video_test_0000004_rgb.npy | ├── v__1vYKA7mNLI_rgb.npy
└── ... | └── ...
├── videos | ├── videos
├── val | ├── training
├── video_validation_0000051.mp4 | ├── v___c8enCfzqw.mp4
└──... | └──...
├── test | ├── validation
├── video_test_0000004.mp4 | ├── v__1vYKA7mNLI.mp4
└──... | └──...
annotations.json | annotations_1.2.json, annotations_1.3.json
You can easily train and test the model by running the script below. If you want to try other options, please refer to utils.py.
python main.py --data_name activitynet1.2 --num_segments 80 --seed 42
python main.py --data_name thumos14 --model_file result/thumos14.pth
The models are trained on one NVIDIA GeForce RTX 3090 GPU (24G). seed
is 42
for all datasets, num_seg
is 80
,
alpha
is 0.8
and batch_size
is 128
for both activitynet1.2&1.3
datasets, the other hyper-parameters are the
default values.
Method | THUMOS14 | Download | |||||||
---|---|---|---|---|---|---|---|---|---|
[email protected] | [email protected] | [email protected] | [email protected] | [email protected] | [email protected] | [email protected] | mAP@AVG | ||
ACRNet | 76.7 | 70.7 | 61.0 | 49.0 | 37.0 | 24.8 | 13.4 | 47.5 | MEGA |
mAP@AVG is the average mAP under the thresholds [0.1:0.1:0.7].
Method | ActivityNet 1.2 | ActivityNet 1.3 | Download | ||||||
---|---|---|---|---|---|---|---|---|---|
[email protected] | [email protected] | [email protected] | mAP@AVG | [email protected] | [email protected] | [email protected] | mAP@AVG | ||
ACRNet | 46.2 | 28.4 | 5.7 | 28.4 | 40.9 | 26.0 | 5.4 | 25.7 | MEGA |
mAP@AVG is the average mAP under the thresholds [0.5:0.05:0.95].