Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the results #12

Open
bowen-xiao96 opened this issue Aug 4, 2019 · 42 comments
Open

Cannot reproduce the results #12

bowen-xiao96 opened this issue Aug 4, 2019 · 42 comments

Comments

@bowen-xiao96
Copy link

Hello! I myself have also tried to reproduce this paper. However, with very similar network architecture and AdaIN settings, I can only achieve low-resolution faces placed on very fuzzy background. Training for more epoches can not improve performance. I am shocked by your example images, but they can not be reproduced by your GitHub code. After carefully reading all the code, I found that there may be some mistakes in your https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/blob/master/loss/loss_generator.py#L30

   def vgg_x_hook(module, input, output):
       vgg_x_features.append(output.data)
   def vgg_xhat_hook(module, input, output):
      vgg_xhat_features.append(output.data)

The output.data operation seems to be cutting off the gradient flow, making the content loss useless.

I also would like to ask whether there exists any other hidden tricks in training, for example Adam momentum or the depth of generator. Thank you!

@vincent-thevenin
Copy link
Owner

Hello @bowen-xiao96 !
Thanks for pointing that out, I'm puzzled as to how I got my results with the content loss dropping its gradient. That's a major mistake on my part.

I'd love to know more about the issues you encounter that make my code unreproduceable for you.

There are no hidden tricks per se. I followed what Egor Zakharov wrote in his paper and on social media, so Adam should be used without momentum. The generator should be pretty deep for good results, I used 17 residual blocks in total. What made the difference for me was the adain parameters, there should be 2 parameters for each width in each normalization layers.

@mikirui
Copy link

mikirui commented Aug 28, 2019

Hi @bowen-xiao96 ,

Have you tried the code after the modification on vgg loss you mentioned here? Can you reproduce the results of @vincent-thevenin after applying this modification?
Thank you.

@mikirui
Copy link

mikirui commented Aug 28, 2019

Hi @vincent-thevenin ,

I try your code for training but I can only get some blurry results (examples shown below) after training ~ 7 epochs on voxelceleb2 testset.
output_11000

Besides, when I use your provided pre-trained model as initialization, the generated results get worse when I train for more iterations with your codes. Thus there may be some differences between your posted code and your training code for that pre-trained model?
Many thanks to you if you can figure out this question!

@vincent-thevenin
Copy link
Owner

vincent-thevenin commented Sep 4, 2019

Hi @mikirui,
Thank you for trying the code out and sorry for the late answer I somehow did not receive any notification of mentions.

I just made some further changes to how the gradient flow is handled. After fixing @bowen-xiao96 's error I did not test train.py immediately so your blurriness might come from there. However at the latest commit it should be working better. Can you please pull the latest changes and tell me if you see any changes?

@mikirui
Copy link

mikirui commented Sep 8, 2019

Hi @vincent-thevenin ,
Thanks for your reply and updates on codes.

I try your latest code and the results (after ~2 epochs) are shown below:
y_8100
output_8100

I also use the dataloader like https://github.com/grey-eye/talking-heads (which is faster) and run about 5 epoches. The results are shown:
y_34900
output_34900

The generated image quality is better than the last version code, but I think the results are still quite more blurry than the results shown in your README?

@OndrejTexler
Copy link

First of all, @vincent-thevenin, great work - I really appreciate it!

@mikirui, it seems I did exactly what you did.

Results generated using the latest repo code (train dataset, after ~1 epoch, it took ~22 hours on HW I have available) Maybe, there is a chance that in next 4 epochs it would converge to the similar results as you present in README, but I do not think so.
res_1

Results when grey-eye/talking-heads data preprocessing and data loader is used. After 5 epochs (there is still a lot of artifacts and it certainly does not look like README's results)
res_5_2200_1

And ... in 10th epoch something goes really wrong and results start to be "red",
and it never corrects. Result after 25 epochs.
res_5_2200_1

The loss for 25 epochs (in 10th epoch, little increase of the loss is visible, but nothing major)
res_5_2200_1

Bottomline:
Dataloader of grey-eye/talking-heads has significantly less randomness, but it works in their scenario, so I guess it is not an issue here.

Also, I tried your pre-trained model and I am getting really good results with it. I would love to be able to train model achieving similar results as the pre-trained one.

Do you have any suggestions or ideas? Thank you!

@brucechou1983
Copy link

brucechou1983 commented Sep 18, 2019

@vincent-thevenin Thanks for sharing this great repo.

I'am trying this model with a larger training set (still a subset of the entire VoxCeleb2 dev set). During the first few steps it seems like I can see a human face like this.

1399

But the model collapse to whole black very soon at steps around 3k~5k.

3499

Does anyone run into similar problems?

*Update: tried to train from scratch using test set of VoxCeleb2 and got similar results

image

More information:
I had difficulty building caffe so I used the Pytorch_VGGFACE.pth and Pytorch_VGGFACE_IR.py shared in issue #10 . The training data is VoxCeleb2 test set which contains 36273 videos. Hyper-parameters remain untouched. I also tried to increase the feature matching loss weights from 1e1 to 5e1, but the model collapsed even faster.

Module versions:
numpy==1.16.1
torch==1.2.0
torchvision==0.4.0

Environment:
ubuntu 16.04
1080Ti

Any help would be appreciated.

*Update 2:
The mode collapsing issue goes away after I built these vggface files by myself.

Now I run into the same situation as @OndrejTexler . Everything looks red after ~22k steps.

image

The content loss and feature matching loss increase dramatically.

@brucechou1983
Copy link

brucechou1983 commented Sep 25, 2019

Does anyone successfully reproduce the results like @vincent-thevenin 's checkpoint?

@rexxar-liang
Copy link

Got same issue as @OndrejTexler , I try to train the model(using VOX2 training dataset), after 5000 iterations, I use the checkpont for inference, then get "red" results.

@Selimonder
Copy link

20k step in the vox-dev set here. Although results around 10k have colors on it, around 15k step everything appears to be red. I have used a batch size of 4 along with k=8.
Some kind of weird gradient explosion?

@yushizhiyao
Copy link

I got the same error as yours @brucechou1983 , have you found the reason?

@vincent-thevenin
Copy link
Owner

@OndrejTexler @mikirui @brucechou1983 @rexxar-liang @Selimonder @yushizhiyao , Thank you for your feedback!
I finally got my hands on better hardware and finally reproduced your error after 2.4 epochs on the full dataset.

After working on it, I successfully reversed the red outputs, as of now it looks like the model starts over as it recover from the colapse.

The problem seems to comes from how I updated the weights in train.py. I update the generator and the discriminator weights at the same time by calculating the gradient of the sum of lossG and lossD.
However, a component of lossG: lossAdv cancels out lossD when summed together. So the gradient is the same when it should point to different directions for the generator and the discriminator.

I am training the model some more at the moment to see if the problem really comes from that and that I'm not mistaken.

@vincent-thevenin
Copy link
Owner

I have added a commit in the master branch. But you should check out the other branch I am actively maintaining that one and will merge soon. The second branch preprocesses the dataset and uses custom folder paths. I manage to decrease training time by 20x compared to the main branch.

@dimaischenko
Copy link

by 20x compared to the main branch.

Hi, @vincent-thevenin . Do you mean 2x?

@vincent-thevenin
Copy link
Owner

vincent-thevenin commented Jan 17, 2020

@dimaischenko I went from some unsightly 240 hours/epoch on the full dataset on my setup to less than 12h. So indeed 20x :) Also the compressed preprocessed dataset is 17GB compared to 270GB for the full one.

@dimaischenko
Copy link

The problem seems to comes from how I updated the weights in train.py. I update the generator and the discriminator weights at the same time by calculating the gradient of the sum of lossG and lossD.
However, a component of lossG: lossAdv cancels out lossD when summed together. So the gradient is the same when it should point to different directions for the generator and the discriminator.

@vincent-thevenin have you tried to train with new loss and get good results? I tried for 1 epoch on full dev dataset, but lossG is equal 100 and results are much worse than with previous loss (lossG + lossD)

@brucechou1983
Copy link

@dimaischenko I went from some unsightly 240 hours/epoch on the full dataset on my setup to less than 12h. So indeed 20x :) Also the compressed preprocessed dataset is 17GB compared to 270GB for the full one.

@vincent-thevenin Thanks for sharing your findings. May I know the hardware requirement (GPU memory budget) to run your latest updates for the whole dataset?

@dimaischenko
Copy link

Hello @vincent-thevenin! Did you achieve good results with the new loss?

@vincent-thevenin
Copy link
Owner

@dimaischenko I got bad results as well. I experimented around stuff and disabling the adverserial loss and matching loss to just keep the content loss creates outputs similar to what I would get on the main branch. I'm still looking into it and will notify once I reach good results.

@vincent-thevenin
Copy link
Owner

@brucechou1983 The model uses 8gb of vram with batch size of 2. I haven't tested with batch size of 1 but if you're having problems with memory, try reducing batch size to 1 first.

@Bip3
Copy link

Bip3 commented Feb 3, 2020

Hey @vincent-thevenin I'm trying to use your latest branch, but I ran into some problems mostly just understanding what some of your parameters were. What are path_to_Wi and path_to_Preprocess? I assumed path to preprocess was the path to the voxceleb dataset, but when I plug in the correct path it just errors out.

@shiyingZhang90
Copy link

Hi @vincent-thevenin , for the 20x decreased training time branch do you refer to save_disc branch? Do you already get good results from that branch? Thanks

@vincent-thevenin
Copy link
Owner

Hi @Bip3, path_to_Wi is the path to a folder that contains the discriminator vectors for each video. I started using the full dataset and loading everything to gpu just consumes memory uselessly so I save and load when necessary, it is filled when you initialize the Discriminator.

path_to_preprocess is the folder where you save the preprocessed images after running preprocess.py.

I will update the readme to make the changes clearer.
Hope that helps.

@vincent-thevenin
Copy link
Owner

@shiyingZhang90 this is my result for 15 epochs, still training it further at the moment.

image

@shiyingZhang90
Copy link

@vincent-thevenin thanks so much for the update! Actually I don't think calculating the gradient of the sum of lossG and lossD is wrong according to the paper, hence looking forward to great result for the updated code. BTW, I found the way you calculate content loss is different form another repo . Will that reference code help?

@Bip3
Copy link

Bip3 commented Feb 24, 2020

Hey @vincent-thevenin , thanks for the response. Could you give instructions on how you got your save_disc branch training?

Edit: Figured it out. For anyone wondering, you must run preprocessing.py, and save that in the same folder specified under path_to_preprocess in params. Everything else is pretty much the same as master.

@shiyingZhang90
Copy link

Hi @vincent-thevenin, do you get better result after training for more epochs on save_disc branch? I'm still wondering how to achieve training result in your demo

@vincent-thevenin
Copy link
Owner

vincent-thevenin commented Apr 6, 2020

@Bip3 @shiyingZhang90 Hello again, good news, I managed to produce great results with no collapse. Thank you for your patience. Here are some sample images:
image
image
image

@prateek-manocha
Copy link

prateek-manocha commented Apr 29, 2020

Hey @vincent-thevenin.
Great work! Can you please share the weights from which you got the above-shown results?
These are from save_disc branch, right?
I am planning to train the code on the full dataset, hence it'll be helpful if you can clear this out.

@brucechou1983
Copy link

@vincent-thevenin do you achieve this result by current save_disc branch (commit: e461da8) ?

@Jarvisss
Copy link

Jarvisss commented Jul 26, 2020

@vincent-thevenin @brucechou1983

Hi, I used the current save_disc branch (commit: e43ca9f), and ran on the full dataset with K=8 and batchsize=6.
The result seems to be different from yours with much blurry and artifacts.

the results after 3 epochs:

epoch_3_batch_4799

epoch_3_batch_18399

result after 4 epochs:
epoch_4_batch_799

epoch_4_batch_11699

Have you met this kind of results during training? Should I train more epochs?

losses G :
lossG

and losses D:
lossD

@Jarvisss
Copy link

Jarvisss commented Aug 1, 2020

Hi, I have trained for another 10 epochs and got result like yours @vincent-thevenin,

epoch_10_batch_36499

epoch_10_batch_39499

epoch_10_batch_40499

The loss_content kept going down, but loss_adv and loss_matching loss ended going up with training epochs.
image

But it seems cannot keep the source identity, the image generated seems to be a different person from the input image.

I also have a question about the Loss_adv.
In your implementation, the loss_FM is implemented by summing up the L1 loss of feature maps of discriminator,without normalization.

I wonder if this is correct? Or shall we multiply the Loss_FM by 1/layers or something?As in the Pix2PixHD pytorch implementation, it also use the FM loss, and they do the nomalization.

@Jarvisss
Copy link

Jarvisss commented Aug 27, 2020

Hi @vincent-thevenin , Good News!
I've got some good results with a few changes to your code.

The following results are generated from the same person (id_08696) with different driving videos.

Click the images to view video results on Youtube

1. Feed forward without finetuning

2. Fine tuning for 100 epochs

More results:

As we can see, identity gap exists in feed forward results, which can be briged by finetuning.

@Jarvisss
Copy link

Jarvisss commented Aug 27, 2020

@kaahan
The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in BGR order, [0-255],

However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2

So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg,
and make the input to vgg in range of [0-255], BGR order.

Main code:

self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W

Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset

Edit 2: I've create a PR #56 for the update of vgg loss calculation

@Jarvisss
Copy link

Jarvisss commented Sep 2, 2020

Hi @kaahan,

  1. I use the meta-embedding vector e_hat for the inference. And fine-tuning only affects G, D, and do nothing with Embedder.
  2. If the driving landmark(face shape, etc) is too different from the source landmark, the result image would be like the driving one.
    image

It would be helpful if you can share your result,
Best,
Jarvisss

@amil-rp-work
Copy link

Hey @Jarvisss
Wow you got some amazing results!! Congratulations on such an amazing job.
A few quesitions

  • Were you able to replicate the results from the paper?
  • If possible can you please share your model_weights?

@phquanta
Copy link

@kaahan
The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in BGR order, [0-255],

However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2

So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg,
and make the input to vgg in range of [0-255], BGR order.

Main code:

self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W

Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset

Edit 2: I've create a PR #56 for the update of vgg loss calculation

I'm curious is it possible to share weights, or at least lossG vs training ?

I'm having big mismatch between Vincent's losses and yours, his lossG around 10, yours around 100. Also i'm seeing losses 0 for Discriminator with your suggested changes vs normal losses in Vincent's code.

@phquanta
Copy link

phquanta commented May 4, 2021

@kaahan
The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in BGR order, [0-255],

However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2

So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg,
and make the input to vgg in range of [0-255], BGR order.

Main code:

self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W

Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset

Edit 2: I've create a PR #56 for the update of vgg loss calculation

After 8 epochs ?

@HAN-oQo
Copy link

HAN-oQo commented Sep 12, 2021

Hi, @Jarvisss @vincent-thevenin
Now i'm having hard time reproducing the result..
I'm working with 2 rtx 2080ti and batch size 12 (total 6).
And I'm using half of the Voxceleb2 data.
I'm not sure my model is training well. The followings are my sample results with 8 epochs and the loss log.

스크린샷 2021-09-12 오후 8 38 41

스크린샷 2021-09-12 오후 8 39 40

스크린샷 2021-09-12 오후 8 42 44

Do you think the model is training well? How many epochs did you train the model?
Did you also experience this kind of loss log while training?
In my opinion, the discriminator is too strong so it seems converged already.

I would appreciate your reply!
(This is my repo: https://github.com/hanq0212/Few_Shot-Neural_Talking_Head)

@lvZic
Copy link

lvZic commented Nov 25, 2021

Hi, @Jarvisss @vincent-thevenin Now i'm having hard time reproducing the result.. I'm working with 2 rtx 2080ti and batch size 12 (total 6). And I'm using half of the Voxceleb2 data. I'm not sure my model is training well. The followings are my sample results with 8 epochs and the loss log.

스크린샷 2021-09-12 오후 8 38 41 스크린샷 2021-09-12 오후 8 39 40 스크린샷 2021-09-12 오후 8 42 44

Do you think the model is training well? How many epochs did you train the model? Did you also experience this kind of loss log while training? In my opinion, the discriminator is too strong so it seems converged already.

I would appreciate your reply! (This is my repo: https://github.com/hanq0212/Few_Shot-Neural_Talking_Head)

have u reproduced the result successfully?

@HAN-oQo
Copy link

HAN-oQo commented Dec 3, 2021

Hi, @lvZic
After longer training(more than 15days), I could get a much better result.
But still can't reproduce the paper's result perfectly.

@vuthede
Copy link

vuthede commented Sep 30, 2022

Hi @Jarvisss,
Have you resolve the

cannot keep the source identity, the image generated seems to be a different person from the input image

or

If the driving landmark(face shape, etc) is too different from the source landmark, the result image would be like the driving one

I got the same problem, I trained only on 5000 videos and 4-th epoch until now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests