Video frame 'expand' when performing FVD #36

fwbx529 · 2023-08-11T07:57:07Z

Hi, thank you for your great work! I have a question about the FVD evaluation. I intend to follow this work, but I have some problems when evaluating FVD. (The other quanti results are consistent with the paper)

When I check the configs of the videos generated from gif (in tool/metrics/utils.py 'DatasetFVDVideoResize'), I find that the video has the size of [128, 112, 112, 3], however, the gif has only 16 frames. So when I check out the ffmpeg function in tool/metrics/utils.py line 358

out, _ = (ffmpeg.input(path).output('pipe:', format='rawvideo', pix_fmt='rgb24').run(capture_stdout=True, quiet=False))

it outputs something like below, which means it transfers the 16-frame gif to a 128 frame video (and segment it into 8 pieces for the num_seg parameter):

Input #0, gif, from '/root/autodl-tmp/DisCo/run_test/exp/tiktok_ft/outputs//pred_gs1.5_scale-cond1.0-ref1.0_gif/TiktokDance_00337_0010png.gif': Duration: 00:00:05.28, start: 0.000000, bitrate: 866 kb/s Stream #0:0: Video: gif, bgra, 256x256, 3.03 fps, 24.25 tbr, 100 tbn, 100 tbc

Output #0, rawvideo, to 'pipe:': Metadata: encoder : Lavf58.29.100 Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 256x256, q=2-31, 38141 kb/s, 24.25 fps, 24.25 tbn, 24.25 tbc Metadata: encoder : Lavc58.54.100 rawvideo

And if I set the fps in gen_eval.sh as 25 (and the video will be 16 frames), the FVD-3DRN50 will become 96.15 (from More TikTok-Style Training Data (FID-FVD: 15.7))
even if I don't change the fps (remain as 3), the FVD-3DRN50 is 20.34, different from the paper.

So I have 3 questions on this evaluation:

Should we change fps in gen_eval.sh?
Like Incorrect FID-VID and FVD #25 , I evaluate the fvd using: FID-VID：resnet-50-kinetics.pth : "https://github.com/yjh0410/YOWOF/releases/download/yowof-weight/resnet-50-kinetics.pth" with MD5 a044310dff79e2688c342d55a0b202d2, FVD: i3d_pretrained_400.pt : "https://drive.google.com/file/d/1mQK8KD8G6UWRa5t87SRMm5PVXtlpneJT/edit" with MD5 c275f5caff95bea0b712515feedad130. Are these two correct for evalulation?
In Training/Validation Data Split #27 , the authors say the evaluation uses 335-340 and 5 OL video as evaluation, but the provided new10val_TiktokDance-poses-masks.yaml outputs 337/338/201/202/203. Maybe the correct yaml will lead to the paper FVD results?

Thank you!

The text was updated successfully, but these errors were encountered:

Wangt-CN · 2023-09-12T20:46:08Z

Hi @fwbx529 , could you please check my responses here? We can reproduce the results reported in the paper.
Btw, the outputs frame id (201, 202, 203) is NOT the video id of the original TikTok but from the collected online video. There is not leakage between training and testing data.

fwbx529 mentioned this issue Aug 11, 2023

Incorrect FID-VID and FVD #25

Open

Wangt-CN closed this as completed Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video frame 'expand' when performing FVD #36

Video frame 'expand' when performing FVD #36

fwbx529 commented Aug 11, 2023

Wangt-CN commented Sep 12, 2023

Video frame 'expand' when performing FVD #36

Video frame 'expand' when performing FVD #36

Comments

fwbx529 commented Aug 11, 2023

Wangt-CN commented Sep 12, 2023