Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video frame 'expand' when performing FVD #36

Closed
fwbx529 opened this issue Aug 11, 2023 · 1 comment
Closed

Video frame 'expand' when performing FVD #36

fwbx529 opened this issue Aug 11, 2023 · 1 comment

Comments

@fwbx529
Copy link

fwbx529 commented Aug 11, 2023

Hi, thank you for your great work! I have a question about the FVD evaluation. I intend to follow this work, but I have some problems when evaluating FVD. (The other quanti results are consistent with the paper)

When I check the configs of the videos generated from gif (in tool/metrics/utils.py 'DatasetFVDVideoResize'), I find that the video has the size of [128, 112, 112, 3], however, the gif has only 16 frames. So when I check out the ffmpeg function in tool/metrics/utils.py line 358

out, _ = (ffmpeg.input(path).output('pipe:', format='rawvideo', pix_fmt='rgb24').run(capture_stdout=True, quiet=False))

it outputs something like below, which means it transfers the 16-frame gif to a 128 frame video (and segment it into 8 pieces for the num_seg parameter):

Input #0, gif, from '/root/autodl-tmp/DisCo/run_test/exp/tiktok_ft/outputs//pred_gs1.5_scale-cond1.0-ref1.0_gif/TiktokDance_00337_0010png.gif': Duration: 00:00:05.28, start: 0.000000, bitrate: 866 kb/s Stream #0:0: Video: gif, bgra, 256x256, 3.03 fps, 24.25 tbr, 100 tbn, 100 tbc

Output #0, rawvideo, to 'pipe:': Metadata: encoder : Lavf58.29.100 Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 256x256, q=2-31, 38141 kb/s, 24.25 fps, 24.25 tbn, 24.25 tbc Metadata: encoder : Lavc58.54.100 rawvideo

And if I set the fps in gen_eval.sh as 25 (and the video will be 16 frames), the FVD-3DRN50 will become 96.15 (from More TikTok-Style Training Data (FID-FVD: 15.7))
even if I don't change the fps (remain as 3), the FVD-3DRN50 is 20.34, different from the paper.

So I have 3 questions on this evaluation:

  1. Should we change fps in gen_eval.sh?
  2. Like Incorrect FID-VID and FVD #25 , I evaluate the fvd using: FID-VID:resnet-50-kinetics.pth : "https://github.com/yjh0410/YOWOF/releases/download/yowof-weight/resnet-50-kinetics.pth" with MD5 a044310dff79e2688c342d55a0b202d2, FVD: i3d_pretrained_400.pt : "https://drive.google.com/file/d/1mQK8KD8G6UWRa5t87SRMm5PVXtlpneJT/edit" with MD5 c275f5caff95bea0b712515feedad130. Are these two correct for evalulation?
  3. In Training/Validation Data Split #27 , the authors say the evaluation uses 335-340 and 5 OL video as evaluation, but the provided new10val_TiktokDance-poses-masks.yaml outputs 337/338/201/202/203. Maybe the correct yaml will lead to the paper FVD results?

Thank you!

@Wangt-CN
Copy link
Owner

Hi @fwbx529 , could you please check my responses here? We can reproduce the results reported in the paper.
Btw, the outputs frame id (201, 202, 203) is NOT the video id of the original TikTok but from the collected online video. There is not leakage between training and testing data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants