-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training/Validation Data Split #27
Comments
Hi @Boese0601 , thanks for your nice comment. For training, we use 000-334 of TikTok dataset; For testing, we find that there are potential risks of the person ID leakage for the TikTok dataset. Therefore, we choose to collect 10 short videos from both the 335-340 sequence and the Internet to make sure there are no person ID coincide for fair comparison. |
Thanks for your kind reply! That makes things clear. Btw could you please also upload those additional video sequences collected from the internet to Google Drive? |
Hi @Boese0601 , I have submitted the query to the corporation to open-source the additional TikTok-style data. Since it is collected by the corporation so we need to get the permission. |
Hi, I download the tsv file and found that there are additional data in the file. Therefore, in the penultimate line of Table 1, you do not use the tsv file you presented, just use 335-340 sequence for evaluation, is that correct? |
@notorious-eric Hi, do you mean the evaluation data? All the models are evaluated on the same data, i.e., 10 videos which is the combination of the original testing tiktok and additional data. |
What are the videos collected from the Internet for evaluation? |
Why did I download a dataset that didn't have 000 and started with 001 |
Hi, thanks for your great work. I check the TikTok tsv dataset and find that you've already split the dataset into trainig set and validation set. Since it's not easy to match the image with original sequence id of the dataset each by each, Then could you please just clarify that which sequences from original TikTok datset(from 000 to 340) are used for tranining and which are for validation? Thanks!
The text was updated successfully, but these errors were encountered: