Training/Validation Data Split #27

Boese0601 · 2023-07-30T04:15:58Z

Hi, thanks for your great work. I check the TikTok tsv dataset and find that you've already split the dataset into trainig set and validation set. Since it's not easy to match the image with original sequence id of the dataset each by each, Then could you please just clarify that which sequences from original TikTok datset(from 000 to 340) are used for tranining and which are for validation? Thanks!

Wangt-CN · 2023-08-04T07:52:47Z

Hi @Boese0601 , thanks for your nice comment.

For training, we use 000-334 of TikTok dataset; For testing, we find that there are potential risks of the person ID leakage for the TikTok dataset. Therefore, we choose to collect 10 short videos from both the 335-340 sequence and the Internet to make sure there are no person ID coincide for fair comparison.

Boese0601 · 2023-08-06T02:04:03Z

Thanks for your kind reply! That makes things clear. Btw could you please also upload those additional video sequences collected from the internet to Google Drive?

Wangt-CN · 2023-09-12T20:57:33Z

Hi @Boese0601 , I have submitted the query to the corporation to open-source the additional TikTok-style data. Since it is collected by the corporation so we need to get the permission.
Currently, if you want to make a fair comparison, you could follow the penultimate line of Table 1 which does not use the additional data for training.

xianrui-luo · 2023-12-21T08:10:38Z

Hi, I download the tsv file and found that there are additional data in the file. Therefore, in the penultimate line of Table 1, you do not use the tsv file you presented, just use 335-340 sequence for evaluation, is that correct?

Wangt-CN · 2023-12-21T08:16:09Z

@notorious-eric Hi, do you mean the evaluation data? All the models are evaluated on the same data, i.e., 10 videos which is the combination of the original testing tiktok and additional data.

Kelu007 · 2024-05-08T09:42:11Z

Hi @Boese0601 , thanks for your nice comment.

For training, we use 000-334 of TikTok dataset; For testing, we find that there are potential risks of the person ID leakage for the TikTok dataset. Therefore, we choose to collect 10 short videos from both the 335-340 sequence and the Internet to make sure there are no person ID coincide for fair comparison.

What are the videos collected from the Internet for evaluation?

zhuochen02 · 2024-10-19T07:35:05Z

Hi @Boese0601 , thanks for your nice comment.

For training, we use 000-334 of TikTok dataset; For testing, we find that there are potential risks of the person ID leakage for the TikTok dataset. Therefore, we choose to collect 10 short videos from both the 335-340 sequence and the Internet to make sure there are no person ID coincide for fair comparison.

Why did I download a dataset that didn't have 000 and started with 001

fwbx529 mentioned this issue Aug 11, 2023

Video frame 'expand' when performing FVD #36

Closed

Wangt-CN closed this as completed Sep 12, 2023

Kelu007 mentioned this issue May 8, 2024

Train & Test Data Split fudan-generative-vision/champ#112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training/Validation Data Split #27

Training/Validation Data Split #27

Boese0601 commented Jul 30, 2023

Wangt-CN commented Aug 4, 2023

Boese0601 commented Aug 6, 2023

Wangt-CN commented Sep 12, 2023

xianrui-luo commented Dec 21, 2023

Wangt-CN commented Dec 21, 2023

Kelu007 commented May 8, 2024

zhuochen02 commented Oct 19, 2024

Training/Validation Data Split #27

Training/Validation Data Split #27

Comments

Boese0601 commented Jul 30, 2023

Wangt-CN commented Aug 4, 2023

Boese0601 commented Aug 6, 2023

Wangt-CN commented Sep 12, 2023

xianrui-luo commented Dec 21, 2023

Wangt-CN commented Dec 21, 2023

Kelu007 commented May 8, 2024

zhuochen02 commented Oct 19, 2024