-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce the results #12
Comments
Hello @bowen-xiao96 ! I'd love to know more about the issues you encounter that make my code unreproduceable for you. There are no hidden tricks per se. I followed what Egor Zakharov wrote in his paper and on social media, so Adam should be used without momentum. The generator should be pretty deep for good results, I used 17 residual blocks in total. What made the difference for me was the adain parameters, there should be 2 parameters for each width in each normalization layers. |
Hi @bowen-xiao96 , Have you tried the code after the modification on vgg loss you mentioned here? Can you reproduce the results of @vincent-thevenin after applying this modification? |
Hi @vincent-thevenin , I try your code for training but I can only get some blurry results (examples shown below) after training ~ 7 epochs on voxelceleb2 testset. Besides, when I use your provided pre-trained model as initialization, the generated results get worse when I train for more iterations with your codes. Thus there may be some differences between your posted code and your training code for that pre-trained model? |
Hi @mikirui, I just made some further changes to how the gradient flow is handled. After fixing @bowen-xiao96 's error I did not test train.py immediately so your blurriness might come from there. However at the latest commit it should be working better. Can you please pull the latest changes and tell me if you see any changes? |
Hi @vincent-thevenin , I try your latest code and the results (after ~2 epochs) are shown below: I also use the dataloader like https://github.com/grey-eye/talking-heads (which is faster) and run about 5 epoches. The results are shown: The generated image quality is better than the last version code, but I think the results are still quite more blurry than the results shown in your README? |
First of all, @vincent-thevenin, great work - I really appreciate it! @mikirui, it seems I did exactly what you did. Results generated using the latest repo code (train dataset, after ~1 epoch, it took ~22 hours on HW I have available) Maybe, there is a chance that in next 4 epochs it would converge to the similar results as you present in README, but I do not think so. Results when grey-eye/talking-heads data preprocessing and data loader is used. After 5 epochs (there is still a lot of artifacts and it certainly does not look like README's results) And ... in 10th epoch something goes really wrong and results start to be "red", The loss for 25 epochs (in 10th epoch, little increase of the loss is visible, but nothing major) Bottomline: Also, I tried your pre-trained model and I am getting really good results with it. I would love to be able to train model achieving similar results as the pre-trained one. Do you have any suggestions or ideas? Thank you! |
@vincent-thevenin Thanks for sharing this great repo. I'am trying this model with a larger training set (still a subset of the entire VoxCeleb2 dev set). During the first few steps it seems like I can see a human face like this. But the model collapse to whole black very soon at steps around 3k~5k. Does anyone run into similar problems? *Update: tried to train from scratch using test set of VoxCeleb2 and got similar results More information: Module versions: Environment: Any help would be appreciated. *Update 2: Now I run into the same situation as @OndrejTexler . Everything looks red after ~22k steps. The content loss and feature matching loss increase dramatically. |
Does anyone successfully reproduce the results like @vincent-thevenin 's checkpoint? |
Got same issue as @OndrejTexler , I try to train the model(using VOX2 training dataset), after 5000 iterations, I use the checkpont for inference, then get "red" results. |
20k step in the vox-dev set here. Although results around 10k have colors on it, around 15k step everything appears to be red. I have used a batch size of 4 along with k=8. |
I got the same error as yours @brucechou1983 , have you found the reason? |
@OndrejTexler @mikirui @brucechou1983 @rexxar-liang @Selimonder @yushizhiyao , Thank you for your feedback! After working on it, I successfully reversed the red outputs, as of now it looks like the model starts over as it recover from the colapse. The problem seems to comes from how I updated the weights in train.py. I update the generator and the discriminator weights at the same time by calculating the gradient of the sum of lossG and lossD. I am training the model some more at the moment to see if the problem really comes from that and that I'm not mistaken. |
I have added a commit in the master branch. But you should check out the other branch I am actively maintaining that one and will merge soon. The second branch preprocesses the dataset and uses custom folder paths. I manage to decrease training time by 20x compared to the main branch. |
Hi, @vincent-thevenin . Do you mean 2x? |
@dimaischenko I went from some unsightly 240 hours/epoch on the full dataset on my setup to less than 12h. So indeed 20x :) Also the compressed preprocessed dataset is 17GB compared to 270GB for the full one. |
@vincent-thevenin have you tried to train with new loss and get good results? I tried for 1 epoch on full |
@vincent-thevenin Thanks for sharing your findings. May I know the hardware requirement (GPU memory budget) to run your latest updates for the whole dataset? |
Hello @vincent-thevenin! Did you achieve good results with the new loss? |
@dimaischenko I got bad results as well. I experimented around stuff and disabling the adverserial loss and matching loss to just keep the content loss creates outputs similar to what I would get on the main branch. I'm still looking into it and will notify once I reach good results. |
@brucechou1983 The model uses 8gb of vram with batch size of 2. I haven't tested with batch size of 1 but if you're having problems with memory, try reducing batch size to 1 first. |
Hey @vincent-thevenin I'm trying to use your latest branch, but I ran into some problems mostly just understanding what some of your parameters were. What are path_to_Wi and path_to_Preprocess? I assumed path to preprocess was the path to the voxceleb dataset, but when I plug in the correct path it just errors out. |
Hi @vincent-thevenin , for the 20x decreased training time branch do you refer to save_disc branch? Do you already get good results from that branch? Thanks |
Hi @Bip3, path_to_Wi is the path to a folder that contains the discriminator vectors for each video. I started using the full dataset and loading everything to gpu just consumes memory uselessly so I save and load when necessary, it is filled when you initialize the Discriminator. path_to_preprocess is the folder where you save the preprocessed images after running preprocess.py. I will update the readme to make the changes clearer. |
@shiyingZhang90 this is my result for 15 epochs, still training it further at the moment. |
@vincent-thevenin thanks so much for the update! Actually I don't think calculating the gradient of the sum of lossG and lossD is wrong according to the paper, hence looking forward to great result for the updated code. BTW, I found the way you calculate content loss is different form another repo . Will that reference code help? |
Hey @vincent-thevenin , thanks for the response. Could you give instructions on how you got your save_disc branch training? Edit: Figured it out. For anyone wondering, you must run preprocessing.py, and save that in the same folder specified under path_to_preprocess in params. Everything else is pretty much the same as master. |
Hi @vincent-thevenin, do you get better result after training for more epochs on save_disc branch? I'm still wondering how to achieve training result in your demo |
@Bip3 @shiyingZhang90 Hello again, good news, I managed to produce great results with no collapse. Thank you for your patience. Here are some sample images: |
Hey @vincent-thevenin. |
@vincent-thevenin do you achieve this result by current save_disc branch (commit: e461da8) ? |
@vincent-thevenin @brucechou1983 Hi, I used the current save_disc branch (commit: e43ca9f), and ran on the full dataset with K=8 and batchsize=6. the results after 3 epochs: Have you met this kind of results during training? Should I train more epochs? |
Hi, I have trained for another 10 epochs and got result like yours @vincent-thevenin, The loss_content kept going down, but loss_adv and loss_matching loss ended going up with training epochs. But it seems cannot keep the source identity, the image generated seems to be a different person from the input image. I also have a question about the Loss_adv. I wonder if this is correct? Or shall we multiply the Loss_FM by 1/layers or something?As in the Pix2PixHD pytorch implementation, it also use the FM loss, and they do the nomalization. |
Hi @vincent-thevenin , Good News! The following results are generated from the same person (id_08696) with different driving videos. Click the images to view video results on Youtube 1. Feed forward without finetuning As we can see, identity gap exists in feed forward results, which can be briged by finetuning. |
@kaahan However, in this repo, vgg19 and vggface takes images in RGB order, and [0-1] normalized, while keep the weights the same with paper, i.e. So either should you change the weight of content loss, or change the pretrained model to caffe pretrained version, to make the final loss balanced. For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg, Main code:
Edit: I ran the meta-training for ~8 epochs on the voxceleb2 dev dataset Edit 2: I've create a PR #56 for the update of vgg loss calculation |
Hi @kaahan,
It would be helpful if you can share your result, |
Hey @Jarvisss
|
I'm curious is it possible to share weights, or at least lossG vs training ? I'm having big mismatch between Vincent's losses and yours, his lossG around 10, yours around 100. Also i'm seeing losses 0 for Discriminator with your suggested changes vs normal losses in Vincent's code. |
After 8 epochs ? |
Hi, @Jarvisss @vincent-thevenin Do you think the model is training well? How many epochs did you train the model? I would appreciate your reply! |
have u reproduced the result successfully? |
Hi, @lvZic |
Hi @Jarvisss,
or
I got the same problem, I trained only on 5000 videos and 4-th epoch until now. |
Hello! I myself have also tried to reproduce this paper. However, with very similar network architecture and AdaIN settings, I can only achieve low-resolution faces placed on very fuzzy background. Training for more epoches can not improve performance. I am shocked by your example images, but they can not be reproduced by your GitHub code. After carefully reading all the code, I found that there may be some mistakes in your https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models/blob/master/loss/loss_generator.py#L30
The output.data operation seems to be cutting off the gradient flow, making the content loss useless.
I also would like to ask whether there exists any other hidden tricks in training, for example Adam momentum or the depth of generator. Thank you!
The text was updated successfully, but these errors were encountered: