-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect FID-VID and FVD #25
Comments
which checkpoint did you use to evaluate? |
@Fanghaipeng 谢谢 |
exp_folder=$1 |
Hi, I have a similar problem.
FID/L1/SSIM/LPIPS/PSNR are similar, but FVD-3DRN50 (FID-VID) and FVD-3DInception (FVD) are different. (DISCO † (w/. HAP, CFG)) Also, let me attach some gifs generated for FVD, to check whether the generated results are correct. |
BTW, I found that the gif generation using imageio at 3fps (gen_eval.sh) will cause tbr as 24.25, so when ffmpeg change it to video, the origin 16 frame gif will turn into 128 frame video. I try to change the 3fps to 25fps in gen_eval.sh, the results become more wierd: ps: the above results are generated using pytorch 2.0 (with some code change for loading ckpt) and other newer packages. When I reproduce the exact pip package versions in ReadMe, the '25fps' FVD-3DRN50 is 96.15072454699538, and other results are similar, as stated in #36 |
I meet the similar issue! |
I can not aslo reproduce the results using "gen_eval.sh" by FID-VID: 18.86 model provided by the official implementation. When using the default guidance scale 3.0, my result is {'FVD-3DRN50': 21.664065154647858, 'FVD-3DInception': 567.6111442260626}. And, using the optimal guidance scale 1.5 as reported in the paper, my result is {'FVD-3DRN50': 23.933738779128873, 'FVD-3DInception': 564.9114347158875} compared to the paper result FID-VID 18.86 and FVD 393.34. |
Dear all, so sorry for the delay since I cannot achieve the computing resources for this project after the end of my internship in July. A few days ago, I successfully got the temporal access and I try to revisit this codebase. I used a totally new env to make sure that this codebase can be reproduced under most situations.
And we use this resnet-kinetics and i3d checkpoint model (under eval_fvd). I think the different results may due to the different checkpoint model which I forgot to sync from the corporation storage.
We can see that for both baseline and our model, we got better FVD but higher FID-FVD. (Ps: we use the same generated frames for calculating the previous metric in paper and this new metric). We plan to use this new metric calculation to avoid confusion and have updated the evaluation code in the latest commit (Note: if you want to check the reproduction of the previous results, do not pull the latest commit and just download the fvd pretrained model). We will update the paper ASAP. If you meet any further problems about the reproduction, please comment here. |
I can not reproduce the results with the checkpoint(TikTok Training Data), using the updated evaluation code and the provided vision model for achieving fvd metric.
|
Hi, we can try to find the issue. There are actually 2 steps to get the results (a. generate the images; b. get the metric).
|
Hi, sorry for the late reply, as I am currently working on another project here, not focusing on video gen. |
@Wangt-CN |
When I used a 4 NVIDIA A100 batch_size=2 and nframe=16 , and ran gen_eval_tm.sh, I obtained results similar to @asdasdad738 : |
Thanks for great work. @Wangt-CN
I tried to reproduce the results using "gen_eval.sh," but I noticed that the FID-VID and FVD do not match the results reported in the paper. Can you help me with this issue? Is it possible that I am using the incorrect checkpoints?
download checkpoints:
pth : TikTok Training Data (FID-FVD: 18.8)
FID-VID:resnet-50-kinetics.pth : "https://github.com/yjh0410/YOWOF/releases/download/yowof-weight/resnet-50-kinetics.pth"
FVD: i3d_pretrained_400.pt : "https://drive.google.com/file/d/1mQK8KD8G6UWRa5t87SRMm5PVXtlpneJT/edit"
The text was updated successfully, but these errors were encountered: