Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

This repository contains PyTorch implementation of our paper Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning (CVPR 2020).

Prerequisites

Python 3 and PyTorch 1.3.

# clone the repository
git clone [email protected]:cshizhe/hgr_v2t.git
cd hgr_v2t
export PYTHONPATH=$(pwd):${PYTHONPATH}

Datasets

We provide annotations, pretrained features on MSRVTT, TGIF, VATEX and Youtube2Text video captioning datasets, which can be downloaded from BaiduNetdisk (code: vxpi).

Annotations

groundtruth: annotation/RET directory

ref_captions.json: dict, {videoname: [sent]}
sent2rolegraph.augment.json: {sent: (graph_nodes, graph_edges)}

vocabularies: annotation/RET directory int2word.npy: [word] word2int.json: {word: int}
data splits: public_split directory trn_names.npy, val_names.npy, tst_names.npy

Features

For MSRVTT, TGIF and Youtube2Text datasets, we extract features with Resnet152 pretrained on ImageNet. For VATEX dataset, we use the I3D features released by VATEX challenge organizers.

mean pooling features: ordered_feature/MP directory

format: np array, shape=(num_fts, dim_ft) corresponding to the order in data_split names

frame-level features: ordered_feature/SA directory

format: hdf5 file, {name: ft}, ft.shape=(num_frames, dim_ft)

Fine-grained Binary Selection Annotation

We construct the fine-grained binary selection dataset based on the testing set of Youtube2Text dataset. The annotations are in the Youtube2Text/annotation/binary_selection directory.

Training & Inference

Semantic Graph Construction

We provided constructed role graph annotations. If you want to generate role graphs for new datasets, please follow the following instructions.

semantic role labeling:

python misc/semantic_role_labeling.py ref_caption_file out_file --cuda_device 0

convert sentence into role graph:

cd misc
jupyter notebook
# open parse_sent_to_role_graph.ipynd

Training and Evaluation

The baseline VSE++ model:

cd t2vretrieval/driver

# setup config files
# you should modify data paths in configs/prepare_globalmatch_configs.py
python configs/prepare_globalmatch_cofig.py $datadir
resdir='' # copy the output string of the previous step

# training
python global_match.py $resdir/model.json $resdir/path.json --is_train --resume_file $resdir/../../word_embeds.glove42b.th

# inference
python global_match.py $resdir/model.json $resdir/path.json --eval_set tst

Our HGR model:

cd t2vretrieval/driver

# setup config files
# you should modify data paths in configs/prepare_mlmatch_configs.py
python configs/prepare_mlmatch_configs.py $datadir
resdir='' # copy the output string of the previous step

# training
python multilevel_match.py $resdir/model.json $resdir/path.json --load_video_first --is_train --resume_file $resdir/../../word_embeds.glove42b.th

# inference
python multilevel_match.py $resdir/model.json $resdir/path.json --load_video_first --eval_set tst

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{chen2020fine,
  title={Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning},
  author={Chen, Shizhe and Zhao, Yida and Jin, Qin and Wu, Qi},
  journal={CVPR},
  year={2020}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figures		figures
framework		framework
t2vretrieval		t2vretrieval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

Prerequisites

Datasets

Annotations

Features

Fine-grained Binary Selection Annotation

Training & Inference

Semantic Graph Construction

Training and Evaluation

Citations

License

About

Releases

Packages

Languages

License

cshizhe/hgr_v2t

Folders and files

Latest commit

History

Repository files navigation

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

Prerequisites

Datasets

Annotations

Features

Fine-grained Binary Selection Annotation

Training & Inference

Semantic Graph Construction

Training and Evaluation

Citations

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages