CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset

This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset>, which has been accepted by CVPR 2023.

News !!!

The codebase are now available at AntMMF-CNVid-VTP.
Another instruction/project that may be helpful to download original videos in the CNVid-3.5M dataset: MediaCrawler.
If you have any questions about CNVid-3.5M, it is recommended to raise your questions at AntMMF project.

IMPORTANT!

Please first check TERMS.md, LEGAL.md, and LICENSE. You must not use the content in this dataset if you do not agree to the terms, legal disclaimer, and license outlined in these files.

We note that we do not own the copyright to any of the collected data. The distribution of identities and activities in the CNVid-3.5M dataset may not be representative of the global human population and the diversity in society. Please be careful of unintended societal, gender, racial, and other biases when training or deploying models trained on this data.

What is CNVid-3.5M?

CNVid-3.5M is a large-scale public cross-modal dataset containing over 3.5 Million Chinese video-text pairs. We summarize our contributions by three verbs, i.e., “Build”, “Filter”, and “Pre-train”: 1) To build a public Chinese video-text dataset, we collect over 4.5M videos from Chinese websites. 2) To improve the data quality, we propose a novel method to filter out 1M weakly-paired videos, resulting in the CNVid-3.5M dataset.

Dataset

Check DATASET.md for instructions of dataset downloading and preprocessing (CNVid-3.5M).

Codebase

The codebase are now available at AntMMF-CNVid-VTP.

Benchmark

Check BENCHMARK.md for instructions of benchmark downloading and model fine-tuning (CNVid-3.5M).

We have already prepared the benchmark, but we still need some time to obtain the external disclosure authorization from our group. All benchmarks are planned to be published in January 2024.

Citation

If you find CNVid-3.5M useful, please consider citing the following paper:

@inproceedings{gan2023cnvid,
  title={CNVid-3.5 M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset},
  author={Gan, Tian and Wang, Qing and Dong, Xingning and Ren, Xiangyuan and Nie, Liqiang and Guo, Qingpei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={14815--14824},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
demo_figs		demo_figs
BENCHMARK.md		BENCHMARK.md
CODEBASE.md		CODEBASE.md
DATASET.md		DATASET.md
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
TERMS.md		TERMS.md
download_cnvid_video.py		download_cnvid_video.py
download_utils		download_utils

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset

News !!!

IMPORTANT!

What is CNVid-3.5M?

Dataset

Codebase

Benchmark

Citation

The word cloud for Top-200 TOPICs in CNVid-3.5M.

The word cloud for Top-200 KEYWORDs in CNVid-3.5M.

About

Releases

Packages

Contributors 2

Languages

License

CNVid/CNVid-3.5M

Folders and files

Latest commit

History

Repository files navigation

CNVid-3.5M: Build, Filter, and Pre-train the Large-scale Public Chinese Video-text Dataset

News !!!

IMPORTANT!

What is CNVid-3.5M?

Dataset

Codebase

Benchmark

Citation

The word cloud for Top-200 TOPICs in CNVid-3.5M.

The word cloud for Top-200 KEYWORDs in CNVid-3.5M.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages