Cannot reproduce the results #12

bowen-xiao96 · 2019-08-04T02:16:27Z

Hello! I myself have also tried to reproduce this paper. However, with very similar network architecture and AdaIN settings, I can only achieve low-resolution faces placed on very fuzzy background. Training for more epoches can not improve performance. I am shocked by your example images, but they can not be reproduced by your GitHub code. After carefully reading all the code, I found that there may be some mistakes in your https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/blob/master/loss/loss_generator.py#L30

   def vgg_x_hook(module, input, output):
       vgg_x_features.append(output.data)
   def vgg_xhat_hook(module, input, output):
      vgg_xhat_features.append(output.data)

The output.data operation seems to be cutting off the gradient flow, making the content loss useless.

I also would like to ask whether there exists any other hidden tricks in training, for example Adam momentum or the depth of generator. Thank you!

The text was updated successfully, but these errors were encountered:

vincent-thevenin · 2019-08-11T17:26:53Z

Hello @bowen-xiao96 !
Thanks for pointing that out, I'm puzzled as to how I got my results with the content loss dropping its gradient. That's a major mistake on my part.

I'd love to know more about the issues you encounter that make my code unreproduceable for you.

There are no hidden tricks per se. I followed what Egor Zakharov wrote in his paper and on social media, so Adam should be used without momentum. The generator should be pretty deep for good results, I used 17 residual blocks in total. What made the difference for me was the adain parameters, there should be 2 parameters for each width in each normalization layers.

mikirui · 2019-08-28T09:46:40Z

Hi @bowen-xiao96 ,

Have you tried the code after the modification on vgg loss you mentioned here? Can you reproduce the results of @vincent-thevenin after applying this modification?
Thank you.

mikirui · 2019-08-28T14:48:35Z

Hi @vincent-thevenin ,

I try your code for training but I can only get some blurry results (examples shown below) after training ~ 7 epochs on voxelceleb2 testset.

Besides, when I use your provided pre-trained model as initialization, the generated results get worse when I train for more iterations with your codes. Thus there may be some differences between your posted code and your training code for that pre-trained model?
Many thanks to you if you can figure out this question!

vincent-thevenin · 2019-09-04T17:34:09Z

Hi @mikirui,
Thank you for trying the code out and sorry for the late answer I somehow did not receive any notification of mentions.

I just made some further changes to how the gradient flow is handled. After fixing @bowen-xiao96 's error I did not test train.py immediately so your blurriness might come from there. However at the latest commit it should be working better. Can you please pull the latest changes and tell me if you see any changes?

mikirui · 2019-09-08T10:00:07Z

Hi @vincent-thevenin ,
Thanks for your reply and updates on codes.

I try your latest code and the results (after ~2 epochs) are shown below:

I also use the dataloader like https://github.com/grey-eye/talking-heads (which is faster) and run about 5 epoches. The results are shown:

The generated image quality is better than the last version code, but I think the results are still quite more blurry than the results shown in your README?

OndrejTexler · 2019-09-10T20:44:31Z

First of all, @vincent-thevenin, great work - I really appreciate it!

@mikirui, it seems I did exactly what you did.

Results generated using the latest repo code (train dataset, after ~1 epoch, it took ~22 hours on HW I have available) Maybe, there is a chance that in next 4 epochs it would converge to the similar results as you present in README, but I do not think so.

Results when grey-eye/talking-heads data preprocessing and data loader is used. After 5 epochs (there is still a lot of artifacts and it certainly does not look like README's results)

And ... in 10th epoch something goes really wrong and results start to be "red",
and it never corrects. Result after 25 epochs.

The loss for 25 epochs (in 10th epoch, little increase of the loss is visible, but nothing major)

Bottomline:
Dataloader of grey-eye/talking-heads has significantly less randomness, but it works in their scenario, so I guess it is not an issue here.

Also, I tried your pre-trained model and I am getting really good results with it. I would love to be able to train model achieving similar results as the pre-trained one.

Do you have any suggestions or ideas? Thank you!

brucechou1983 · 2019-09-18T03:33:42Z

@vincent-thevenin Thanks for sharing this great repo.

I'am trying this model with a larger training set (still a subset of the entire VoxCeleb2 dev set). During the first few steps it seems like I can see a human face like this.

But the model collapse to whole black very soon at steps around 3k~5k.

Does anyone run into similar problems?

*Update: tried to train from scratch using test set of VoxCeleb2 and got similar results

More information:
I had difficulty building caffe so I used the Pytorch_VGGFACE.pth and Pytorch_VGGFACE_IR.py shared in issue #10 . The training data is VoxCeleb2 test set which contains 36273 videos. Hyper-parameters remain untouched. I also tried to increase the feature matching loss weights from 1e1 to 5e1, but the model collapsed even faster.

Module versions:
numpy==1.16.1
torch==1.2.0
torchvision==0.4.0

Environment:
ubuntu 16.04
1080Ti

Any help would be appreciated.

*Update 2:
The mode collapsing issue goes away after I built these vggface files by myself.

Now I run into the same situation as @OndrejTexler . Everything looks red after ~22k steps.

The content loss and feature matching loss increase dramatically.

brucechou1983 · 2019-09-25T03:42:12Z

Does anyone successfully reproduce the results like @vincent-thevenin 's checkpoint?

rexxar-liang · 2019-10-12T10:53:26Z

Got same issue as @OndrejTexler , I try to train the model(using VOX2 training dataset), after 5000 iterations, I use the checkpont for inference, then get "red" results.

Selimonder · 2019-10-18T15:01:34Z

20k step in the vox-dev set here. Although results around 10k have colors on it, around 15k step everything appears to be red. I have used a batch size of 4 along with k=8.
Some kind of weird gradient explosion?

yushizhiyao · 2019-11-04T13:29:31Z

I got the same error as yours @brucechou1983 , have you found the reason?

vincent-thevenin · 2020-01-17T17:19:20Z

@OndrejTexler @mikirui @brucechou1983 @rexxar-liang @Selimonder @yushizhiyao , Thank you for your feedback!
I finally got my hands on better hardware and finally reproduced your error after 2.4 epochs on the full dataset.

After working on it, I successfully reversed the red outputs, as of now it looks like the model starts over as it recover from the colapse.

The problem seems to comes from how I updated the weights in train.py. I update the generator and the discriminator weights at the same time by calculating the gradient of the sum of lossG and lossD.
However, a component of lossG: lossAdv cancels out lossD when summed together. So the gradient is the same when it should point to different directions for the generator and the discriminator.

I am training the model some more at the moment to see if the problem really comes from that and that I'm not mistaken.

vincent-thevenin · 2020-01-17T18:37:17Z

I have added a commit in the master branch. But you should check out the other branch I am actively maintaining that one and will merge soon. The second branch preprocesses the dataset and uses custom folder paths. I manage to decrease training time by 20x compared to the main branch.

dimaischenko · 2020-01-17T20:03:44Z

by 20x compared to the main branch.

Hi, @vincent-thevenin . Do you mean 2x?

vincent-thevenin · 2020-01-17T23:24:11Z

@dimaischenko I went from some unsightly 240 hours/epoch on the full dataset on my setup to less than 12h. So indeed 20x :) Also the compressed preprocessed dataset is 17GB compared to 270GB for the full one.

dimaischenko · 2020-01-18T08:22:41Z

The problem seems to comes from how I updated the weights in train.py. I update the generator and the discriminator weights at the same time by calculating the gradient of the sum of lossG and lossD.
However, a component of lossG: lossAdv cancels out lossD when summed together. So the gradient is the same when it should point to different directions for the generator and the discriminator.

@vincent-thevenin have you tried to train with new loss and get good results? I tried for 1 epoch on full dev dataset, but lossG is equal 100 and results are much worse than with previous loss (lossG + lossD)

brucechou1983 · 2020-01-18T08:55:48Z

@dimaischenko I went from some unsightly 240 hours/epoch on the full dataset on my setup to less than 12h. So indeed 20x :) Also the compressed preprocessed dataset is 17GB compared to 270GB for the full one.

@vincent-thevenin Thanks for sharing your findings. May I know the hardware requirement (GPU memory budget) to run your latest updates for the whole dataset?

dimaischenko · 2020-01-22T19:29:19Z

Hello @vincent-thevenin! Did you achieve good results with the new loss?

vincent-thevenin · 2020-01-27T11:23:50Z

@dimaischenko I got bad results as well. I experimented around stuff and disabling the adverserial loss and matching loss to just keep the content loss creates outputs similar to what I would get on the main branch. I'm still looking into it and will notify once I reach good results.

vincent-thevenin · 2020-01-27T11:26:02Z

@brucechou1983 The model uses 8gb of vram with batch size of 2. I haven't tested with batch size of 1 but if you're having problems with memory, try reducing batch size to 1 first.

Bip3 · 2020-02-03T23:51:59Z

Hey @vincent-thevenin I'm trying to use your latest branch, but I ran into some problems mostly just understanding what some of your parameters were. What are path_to_Wi and path_to_Preprocess? I assumed path to preprocess was the path to the voxceleb dataset, but when I plug in the correct path it just errors out.

shiyingZhang90 · 2020-02-15T01:22:48Z

Hi @vincent-thevenin , for the 20x decreased training time branch do you refer to save_disc branch? Do you already get good results from that branch? Thanks

vincent-thevenin · 2020-02-19T10:43:14Z

Hi @Bip3, path_to_Wi is the path to a folder that contains the discriminator vectors for each video. I started using the full dataset and loading everything to gpu just consumes memory uselessly so I save and load when necessary, it is filled when you initialize the Discriminator.

path_to_preprocess is the folder where you save the preprocessed images after running preprocess.py.

I will update the readme to make the changes clearer.
Hope that helps.

vincent-thevenin · 2020-02-19T10:50:33Z

@shiyingZhang90 this is my result for 15 epochs, still training it further at the moment.

shiyingZhang90 · 2020-02-20T00:07:00Z

@vincent-thevenin thanks so much for the update! Actually I don't think calculating the gradient of the sum of lossG and lossD is wrong according to the paper, hence looking forward to great result for the updated code. BTW, I found the way you calculate content loss is different form another repo . Will that reference code help?

Bip3 · 2020-02-24T22:36:29Z

Hey @vincent-thevenin , thanks for the response. Could you give instructions on how you got your save_disc branch training?

Edit: Figured it out. For anyone wondering, you must run preprocessing.py, and save that in the same folder specified under path_to_preprocess in params. Everything else is pretty much the same as master.

shiyingZhang90 · 2020-03-25T16:47:30Z

Hi @vincent-thevenin, do you get better result after training for more epochs on save_disc branch? I'm still wondering how to achieve training result in your demo

vincent-thevenin · 2020-04-06T15:49:19Z

@Bip3 @shiyingZhang90 Hello again, good news, I managed to produce great results with no collapse. Thank you for your patience. Here are some sample images:

prateek-manocha · 2020-04-29T21:48:27Z

Hey @vincent-thevenin.
Great work! Can you please share the weights from which you got the above-shown results?
These are from save_disc branch, right?
I am planning to train the code on the full dataset, hence it'll be helpful if you can clear this out.

brucechou1983 · 2020-05-19T04:30:38Z

@vincent-thevenin do you achieve this result by current save_disc branch (commit: e461da8) ?

Jarvisss · 2020-07-26T01:46:25Z

@vincent-thevenin @brucechou1983

Hi, I used the current save_disc branch (commit: e43ca9f), and ran on the full dataset with K=8 and batchsize=6.
The result seems to be different from yours with much blurry and artifacts.

the results after 3 epochs:

result after 4 epochs:

Have you met this kind of results during training? Should I train more epochs?

losses G :

and losses D:

Jarvisss · 2020-08-01T14:19:58Z

Hi, I have trained for another 10 epochs and got result like yours @vincent-thevenin,

The loss_content kept going down, but loss_adv and loss_matching loss ended going up with training epochs.

But it seems cannot keep the source identity, the image generated seems to be a different person from the input image.

I also have a question about the Loss_adv.
In your implementation, the loss_FM is implemented by summing up the L1 loss of feature maps of discriminator,without normalization.

I wonder if this is correct? Or shall we multiply the Loss_FM by 1/layers or something?As in the Pix2PixHD pytorch implementation, it also use the FM loss, and they do the nomalization.

Jarvisss · 2020-08-27T01:56:43Z

Hi @vincent-thevenin , Good News!
I've got some good results with a few changes to your code.

The following results are generated from the same person (id_08696) with different driving videos.

Click the images to view video results on Youtube

1. Feed forward without finetuning

2. Fine tuning for 100 epochs

More results:

As we can see, identity gap exists in feed forward results, which can be briged by finetuning.

Jarvisss · 2020-08-27T05:33:24Z

@kaahan
The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in BGR order, [0-255],

However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2

So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg,
and make the input to vgg in range of [0-255], BGR order.

Main code:

self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W

Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset

Edit 2: I've create a PR #56 for the update of vgg loss calculation

Jarvisss · 2020-09-02T01:09:46Z

Hi @kaahan,

I use the meta-embedding vector e_hat for the inference. And fine-tuning only affects G, D, and do nothing with Embedder.
If the driving landmark(face shape, etc) is too different from the source landmark, the result image would be like the driving one.

It would be helpful if you can share your result,
Best,
Jarvisss

amil-rp-work · 2021-04-09T08:28:22Z

Hey @Jarvisss
Wow you got some amazing results!! Congratulations on such an amazing job.
A few quesitions

Were you able to replicate the results from the paper?
If possible can you please share your model_weights?

phquanta · 2021-04-24T16:08:55Z

@kaahan
The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in BGR order, [0-255],

However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2

So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg,
and make the input to vgg in range of [0-255], BGR order.

Main code:
self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset

Edit 2: I've create a PR #56 for the update of vgg loss calculation

I'm curious is it possible to share weights, or at least lossG vs training ?

I'm having big mismatch between Vincent's losses and yours, his lossG around 10, yours around 100. Also i'm seeing losses 0 for Discriminator with your suggested changes vs normal losses in Vincent's code.

phquanta · 2021-05-04T02:18:25Z

@kaahan
The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in BGR order, [0-255],

However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2

So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg,
and make the input to vgg in range of [0-255], BGR order.

Main code:
self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset

Edit 2: I've create a PR #56 for the update of vgg loss calculation

After 8 epochs ?

HAN-oQo · 2021-09-12T11:41:33Z

Hi, @Jarvisss @vincent-thevenin
Now i'm having hard time reproducing the result..
I'm working with 2 rtx 2080ti and batch size 12 (total 6).
And I'm using half of the Voxceleb2 data.
I'm not sure my model is training well. The followings are my sample results with 8 epochs and the loss log.

Do you think the model is training well? How many epochs did you train the model?
Did you also experience this kind of loss log while training?
In my opinion, the discriminator is too strong so it seems converged already.

I would appreciate your reply!
(This is my repo: https://github.com/hanq0212/Few_Shot-Neural_Talking_Head)

lvZic · 2021-11-25T06:54:38Z

Hi, @Jarvisss @vincent-thevenin Now i'm having hard time reproducing the result.. I'm working with 2 rtx 2080ti and batch size 12 (total 6). And I'm using half of the Voxceleb2 data. I'm not sure my model is training well. The followings are my sample results with 8 epochs and the loss log.

Do you think the model is training well? How many epochs did you train the model? Did you also experience this kind of loss log while training? In my opinion, the discriminator is too strong so it seems converged already.

I would appreciate your reply! (This is my repo: https://github.com/hanq0212/Few_Shot-Neural_Talking_Head)

have u reproduced the result successfully?

HAN-oQo · 2021-12-03T04:25:52Z

Hi, @lvZic
After longer training(more than 15days), I could get a much better result.
But still can't reproduce the paper's result perfectly.

vuthede · 2022-09-30T00:32:26Z

Hi @Jarvisss,
Have you resolve the

cannot keep the source identity, the image generated seems to be a different person from the input image

or

If the driving landmark(face shape, etc) is too different from the source landmark, the result image would be like the driving one

I got the same problem, I trained only on 5000 videos and 4-th epoch until now.

vincent-thevenin closed this as completed Aug 11, 2019

vincent-thevenin reopened this Sep 4, 2019

Jarvisss mentioned this issue Aug 8, 2020

Cannot get better Result like README #54

Open

phquanta mentioned this issue Apr 28, 2021

VGG_Face weights grey-eye/talking-heads#51

Open

Cannot reproduce the results #12

Cannot reproduce the results #12

Comments

bowen-xiao96 commented Aug 4, 2019

vincent-thevenin commented Aug 11, 2019

mikirui commented Aug 28, 2019

mikirui commented Aug 28, 2019

vincent-thevenin commented Sep 4, 2019 • edited Loading

mikirui commented Sep 8, 2019 • edited Loading

OndrejTexler commented Sep 10, 2019

brucechou1983 commented Sep 18, 2019 • edited Loading

brucechou1983 commented Sep 25, 2019 • edited Loading

rexxar-liang commented Oct 12, 2019

Selimonder commented Oct 18, 2019

yushizhiyao commented Nov 4, 2019

vincent-thevenin commented Jan 17, 2020

vincent-thevenin commented Jan 17, 2020

dimaischenko commented Jan 17, 2020

vincent-thevenin commented Jan 17, 2020 • edited Loading

dimaischenko commented Jan 18, 2020

brucechou1983 commented Jan 18, 2020

dimaischenko commented Jan 22, 2020

vincent-thevenin commented Jan 27, 2020

vincent-thevenin commented Jan 27, 2020

Bip3 commented Feb 3, 2020

shiyingZhang90 commented Feb 15, 2020

vincent-thevenin commented Feb 19, 2020

vincent-thevenin commented Feb 19, 2020

shiyingZhang90 commented Feb 20, 2020

Bip3 commented Feb 24, 2020 • edited Loading

shiyingZhang90 commented Mar 25, 2020

vincent-thevenin commented Apr 6, 2020 • edited Loading

prateek-manocha commented Apr 29, 2020 • edited Loading

brucechou1983 commented May 19, 2020

Jarvisss commented Jul 26, 2020 • edited Loading

Jarvisss commented Aug 1, 2020 • edited Loading

Jarvisss commented Aug 27, 2020 • edited Loading

Jarvisss commented Aug 27, 2020 • edited Loading

Jarvisss commented Sep 2, 2020

amil-rp-work commented Apr 9, 2021

phquanta commented Apr 24, 2021

phquanta commented May 4, 2021

HAN-oQo commented Sep 12, 2021 • edited Loading

lvZic commented Nov 25, 2021

HAN-oQo commented Dec 3, 2021

vuthede commented Sep 30, 2022 • edited Loading

vincent-thevenin commented Sep 4, 2019 •

edited

Loading

mikirui commented Sep 8, 2019 •

edited

Loading

brucechou1983 commented Sep 18, 2019 •

edited

Loading

brucechou1983 commented Sep 25, 2019 •

edited

Loading

vincent-thevenin commented Jan 17, 2020 •

edited

Loading

Bip3 commented Feb 24, 2020 •

edited

Loading

vincent-thevenin commented Apr 6, 2020 •

edited

Loading

prateek-manocha commented Apr 29, 2020 •

edited

Loading

Jarvisss commented Jul 26, 2020 •

edited

Loading

Jarvisss commented Aug 1, 2020 •

edited

Loading

Jarvisss commented Aug 27, 2020 •

edited

Loading

Jarvisss commented Aug 27, 2020 •

edited

Loading

HAN-oQo commented Sep 12, 2021 •

edited

Loading

vuthede commented Sep 30, 2022 •

edited

Loading