Evaluation

We provide two types of logs: tensor log or txt file.

1. VQA (Fine-tuning)

Evaluate TGIF

TGIF-QA FrameQA

python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=8 task_finetune_tgifqa \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper	Trained Log
65.0	64.0	Google Driver

For tensor log, simple using

mkdir tensor_log
cp [path_to_provided_logs] tensor_log/
tensorboard --logdir tensor_log

As below:

Notice msrvtt_vqa is a loss name which is equal to open-set VQA in the final code.

TGIF-QA Action/Transition

Modify line 19 in tgifqa for transition/action.

python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=16 task_finetune_tgif_action_trans \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper	Trained Log
93.0	92.5	google driver

MSRVTT-QA

python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=16 task_finetune_msrvttqa \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper	Trained Log
42.9	42.5	google driver

MSVD-QA

python run.py with data_root=DataSet num_gpus=8 \
num_nodes=1 \
num_frames=3 \
per_gpu_batchsize=16 task_finetune_msvdqa \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper	Trained Log
46.1	46.5	google driver

2. Action Recognition (Linear Evaluation)

K400

python run.py \
with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=16 task_finetune_action_recognition_k400 \
num_frames=8 linear_evaluation=True \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper	Trained Log
52.3	50.8	Google Driver

HMDB51

python run.py \
with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_action_recognition_hmdb51 \
num_frames=3 linear_evaluation=True backend='a100' \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper	Trained Log
51.2	50.8	Google Driver

3. Multiple-choice (Zero-shot)

Zero-shot task is directly test the pretrained models, we can get these performances in a few minutes.

LSMDC

python run.py with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_lsmdcchoice test_only=True \
num_frames=3 \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper
56.7	56.3

Example:

MSRVTT

python run.py with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_msrvttchoice test_only=True \
num_frames=3 \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper
78.1	77.5

Ego-4d

python run.py with data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=8 task_finetune_ego4dchoice test_only=True \
num_frames=3 \
load_path="pretrained/all-in-one-base.ckpt"

Accuracy	Report in Paper
36.5	36.5

4. Fast Retrieval

To speed up retrieval efficiency, we get ride of one-to-one matching in one-stream network.

The retrieval time cut down from 26H -> 1H on MSRVTT with our implementation, and only slightly drop in performance.

MSRVTT

VTC only

python run.py with \
data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=32 task_finetune_only_ind_itc_msrvtt_randaug \
num_frames=3 \
load_path="pretrained/all-in-one-base.ckpt"

R1/R5/R10	Report in Paper	Trained Log
35.6/62.8/71.3	36.1/62.3/72.1	Google Driver

VTC + VTM

Since VTM need to sample N false texts, please use GPU memory as much as possible for best result. The R@1 is not very stable, run multiple times for best solution.

python run.py with \
data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=6 task_finetune_ind_itc_irtr_msrvtt_randaug \
num_frames=3 \
load_path="pretrained/all-in-one-base.ckpt"

R1/R5/R10	Report in Paper	Trained Log
35.4/67.0/77.6	37.1/66.7/75.9	Google Driver

VTC AllInOne+

By co-training with image dataset.

python run.py with \
data_root=DataSet num_gpus=8 num_nodes=1 \
per_gpu_batchsize=32 task_finetune_only_ind_itc_msrvtt_randaug \
num_frames=3 \
load_path="pretrained/all-in-one-baseplus.ckpt"

R1/R5/R10	Trained Log
41.8/68.5/76.7	Google Driver

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVAL.md

EVAL.md

Evaluation

1. VQA (Fine-tuning)

Evaluate TGIF

TGIF-QA FrameQA

TGIF-QA Action/Transition

MSRVTT-QA

MSVD-QA

2. Action Recognition (Linear Evaluation)

K400

HMDB51

3. Multiple-choice (Zero-shot)

LSMDC

MSRVTT

Ego-4d

4. Fast Retrieval

MSRVTT

VTC only

VTC + VTM

VTC AllInOne+

Files

EVAL.md

Latest commit

History

EVAL.md

File metadata and controls

Evaluation

1. VQA (Fine-tuning)

Evaluate TGIF

TGIF-QA FrameQA

TGIF-QA Action/Transition

MSRVTT-QA

MSVD-QA

2. Action Recognition (Linear Evaluation)

K400

HMDB51

3. Multiple-choice (Zero-shot)

LSMDC

MSRVTT

Ego-4d

4. Fast Retrieval

MSRVTT

VTC only

VTC + VTM

VTC AllInOne+