This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal Action Proposal Refinement", which is accepted in CVPR 2021.
- 2021.07.02: Update proposals, checkpoints, features for TCANet!
- 2021.05.31: Repository for TCANet
- Paper Introduction
- Other Info
- Prerequisites
- Code and Data Preparation
- Training and Testing of TCANet
Temporal action proposal generation aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet important task in the video understanding field. The proposals generated by current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval owing to the lack of efficient temporal modeling and effective boundary context utilization. In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement. Specifically, we first design a Local-Global Temporal Encoder (LGTE), which adopts the channel grouping strategy to efficiently encode both "local and global" temporal inter-dependencies. Furthermore, both the boundary and internal context of proposals are adopted for frame-level and segment-level boundary regressions, respectively. Temporal Boundary Regressor (TBR) is designed to combine these two regression granularities in an end-to-end fashion, which achieves the precise boundaries and reliable confidence of proposals through progressive refinement. Extensive experiments are conducted on three challenging datasets: HACS, ActivityNet-v1.3, and THUMOS-14, where TCANet can generate proposals with high precision and recall. By combining with the existing action classifier, TCANet can obtain remarkable temporal action detection performance compared with other methods. Not surprisingly, the proposed TCANet won the 1st place in the CVPR 2020 - HACS challenge leaderboard on temporal action localization task.
These code is implemented in Pytorch 1.5.1 + Python3.
Clone this repo with git, please use:
git clone https://github.com/qingzhiwu/Temporal-Context-Aggregation-Network-Pytorch.git
We support experiments with publicly available dataset HACS for temporal action proposal generation now. To download this dataset, please use official HACS downloader to download videos from the YouTube.
To extract visual feature, we adopt Slowfast model pretrained on the training set of HACS. Please refer this repo Slowfast to extract features.
For convenience of training and testing, we provide the rescaled feature at here Google Cloud or Baidu Yun[Code:x3ve].
In Baidu Yun Link, we provide:
-- features/: SlowFast features for training, validation and testing.
-- checkpoint/: Pre-trained TCANet model for SlowFast features provided by us.
-- proposals/: BMN proposals processed by us.
-- classification/: The best classification results we used in paper and 2020 HACS challenge.
All configurations of TCANet are saved in opts.py, where you can modify training and model parameter.
tar -jxvf hacs.bmn.pem.slowfast101.t200.wd1e-5.warmup.pem_input_100.tar.bz2 -C ./
tar -jxvf hacs.bmn.pem.slowfast101.t200.wd1e-5.warmup.pem_input.tar.bz2 -C ./
# for training features
cd features/
cat slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.bz2.*>slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.gz
tar -zxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.gz
tar -jxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.training.tar.bz2 -C .
# for validation features
cd features/
cat slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.bz2.*>slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.gz
tar -zxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.gz
tar -jxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.validation.tar.bz2 -C .
# for testing features
cd features/
cat slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.bz2.*>slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.gz
tar -zxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.gz
tar -jxvf slowfast101.epoch9.87.52.finetune.pool.t.keep.t.s8.testing.tar.bz2 -C .
python3 main_tcanet.py --mode train \
--checkpoint_path ./checkpoint/ \
--video_anno /path/to/HACS_segments_v1.1.1.json \
--feature_path /path/to/feature/ \
--train_proposals_path /path/to/pem_input_100/in/proposals \
--test_proposals_path /path/to/pem_input/in/proposals
We also provide trained TCANet model in ./checkpoint
in our BaiduYun Link.
# We split the dataset into 4 parts, and inference these parts on 4 gpus
python3 main_tcanet.py --mode inference --part_idx 0 --gpu 0 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py --mode inference --part_idx 1 --gpu 1 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py --mode inference --part_idx 2 --gpu 2 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py --mode inference --part_idx 3 --gpu 3 --classifier_result /path/to/classifier/{}94.32.json
python3 main_tcanet.py --mode inference --part_idx -1
Please cite the following paper if you feel TCANet useful to your research
@inproceedings{qing2021temporal,
title={Temporal Context Aggregation Network for Temporal Action Proposal Refinement},
author={Qing, Zhiwu and Su, Haisheng and Gan, Weihao and Wang, Dongliang and Wu, Wei and Wang, Xiang and Qiao, Yu and Yan, Junjie and Gao, Changxin and Sang, Nong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={485--494},
year={2021}
}
For any question, please file an issue or contact
Zhiwu Qing: [email protected]