Skip to content

Latest commit

 

History

History

get_divide_dataset

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Data Release Note

v1.0.0: merge_divide_and_maxwell_splits

The DIVIDE-3k dataset was originally proposed in DOVER, as the first dataset with aesthetic and technical labels in addition to overall labels. In this dataset, we invite trained subjects to conduct a subjective study in-lab.

In the project ExplainableVQA, we further invite the same group of trained subjects to provide 13 dimensions of explanation-level labels for the DIVIDE-3k (i.e. the MaxWell database). These subjects also further labeled around 1K new videos to expand the database.

As the videos in MaxWell is a superset of the videos in the DIVIDE-3k, after the internal discussion of our team, we decide to unify the train and val splits in the two datasets, and re-run DOVER and DOVER++ on the merged DIVIDE-MaxWell database (a superset of DIVIDE-3k, each one labeled with aesthetic and technical, and overall perspective scores) on the same official split as provided in MaxWell, i.e. 3634 training videos (80%), and 909 validation videos. The results of both variants are shown as follows, while the model checkpoint of DOVER++ (on the official train / test sets) will be uploaded soon to faciliate further research.

Download Videos

Download the videos in Hugging Face Datasets.

Labels

The labels are provided here for the official training set and official validation set

Training for DOVER++

To run DOVER++ (enhance end-to-end training with Aesthetic+Technical+overall scores), the scripts are as follows:

python training_with_divide.py --train train-dividemaxwell --val val-dividemaxwell

Results on the Updated Train-Test Splits

As we have changed the train-test split, the results for FAST-VQA (technical branch of DOVER), DOVER and DOVER++ are also changed. See the following table for their results.

Zero-Shot

VQA Approaches

  • DOVER (pre-trained on LSVQ)

SROCC: 0.7477 | PLCC: 0.7546 | KROCC: 0.5510

  • FAST-VQA (==technical branch in DOVER, pre-trained on LSVQ)

SROCC: 0.7204 | PLCC: 0.7282 | KROCC: 0.5286

  • Aesthetic Branch in DOVER (pre-trained on LSVQ)

SROCC: 0.7184 | PLCC: 0.7293 | KROCC: 0.5249

IQA Approaches

  • SAQI (CLIP-ResNet-50)

SROCC: 0.5518 | PLCC: 0.5549 | KROCC: 0.3814

  • NIQE

SROCC: 0.2847 | PLCC: 0.3014 | KROCC: 0.2150

SROCC: 0.6821 | PLCC: 0.6923 | KROCC: 0.4949

Fine-tuned

Baseline Methods

  • VSFA (Li et al, 2019, trained on the training set of DIVIDE-MaxWell, only overall score used)

SROCC: 0.6671 | PLCC: 0.6784 | KROCC: 0.4875

  • BVQA (Zhang et al, 2022, trained on the training set of DIVIDE-MaxWell, only overall score used)

SROCC: 0.7418 | PLCC: 0.7394 | KROCC: 0.5341

  • FAST-VQA (Wu et al, 2022, trained on the training set of DIVIDE-MaxWell, only overall score used)

SROCC: 0.7798 | PLCC: 0.7819 | KROCC: 0.5868

Methods Proposed with DIVIDE or MaxWell

  • MaxVQA (trained on the training set of DIVIDE-MaxWell, 16 dimensions used)

SROCC: 0.8044 | PLCC: 0.8131 | KROCC: 0.6098

  • DOVER++ (trained on the training set of DIVIDE-MaxWell, 3 dimensions used)

SROCC: 0.8071 | PLCC: 0.8126 | KROCC: 0.6136

All methods are based on training and the results main contain randomness.