We provide two types of logs: tensor log or txt file .
python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=8 task_finetune_tgifqa \
load_path=" pretrained/all-in-one-base.ckpt"
For tensor log, simple using
mkdir tensor_log
cp [path_to_provided_logs] tensor_log/
tensorboard --logdir tensor_log
As below:
Notice msrvtt_vqa is a loss name which is equal to open-set VQA in the final code.
TGIF-QA Action/Transition
Modify line 19 in tgifqa
for transition/action.
python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=16 task_finetune_tgif_action_trans \
load_path=" pretrained/all-in-one-base.ckpt"
python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=16 task_finetune_msrvttqa \
load_path=" pretrained/all-in-one-base.ckpt"
python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=16 task_finetune_msvdqa \
load_path=" pretrained/all-in-one-base.ckpt"
2. Action Recognition (Linear Evaluation )
python run.py \
with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=16 task_finetune_action_recognition_k400 \
num_frames=8 linear_evaluation=True \
load_path=" pretrained/all-in-one-base.ckpt"
python run.py \
with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_action_recognition_hmdb51 \
num_frames=3 linear_evaluation=True backend=' a100' \
load_path=" pretrained/all-in-one-base.ckpt"
3. Multiple-choice (Zero-shot )
Zero-shot task is directly test the pretrained models, we can get these performances in a few minutes.
python run.py with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_lsmdcchoice test_only=True \
num_frames=3 \
load_path=" pretrained/all-in-one-base.ckpt"
Accuracy
Report in Paper
56.7
56.3
Example:
python run.py with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_msrvttchoice test_only=True \
num_frames=3 \
load_path=" pretrained/all-in-one-base.ckpt"
Accuracy
Report in Paper
78.1
77.5
python run.py with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_ego4dchoice test_only=True \
num_frames=3 \
load_path=" pretrained/all-in-one-base.ckpt"
Accuracy
Report in Paper
36.5
36.5
To speed up retrieval efficiency, we get ride of one-to-one matching in one-stream network.
The retrieval time cut down from 26H -> 1H on MSRVTT with our implementation, and only slightly drop in performance.
python run.py with \
data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=32 task_finetune_only_ind_itc_msrvtt_randaug \
num_frames=3 \
load_path=" pretrained/all-in-one-base.ckpt"
R1/R5/R10
Report in Paper
Trained Log
35.6/62.8/71.3
36.1/62.3/72.1
Google Driver
Since VTM need to sample N false texts, please use GPU memory as much as possible for best result.
The R@1 is not very stable, run multiple times for best solution.
python run.py with \
data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=6 task_finetune_ind_itc_irtr_msrvtt_randaug \
num_frames=3 \
load_path=" pretrained/all-in-one-base.ckpt"
R1/R5/R10
Report in Paper
Trained Log
35.4/67.0/77.6
37.1/66.7/75.9
Google Driver
By co-training with image dataset.
python run.py with \
data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=32 task_finetune_only_ind_itc_msrvtt_randaug \
num_frames=3 \
load_path=" pretrained/all-in-one-baseplus.ckpt"