Issue with migrating to pytorch 0.4 #29

mingyuliutw · 2018-07-27T17:20:29Z

We discover a couple of issues (slower training speed and degraded output image quality) when migrating our code from pytorch 0.3 to 0.4. We are working on fixing the issues. For now, we recommend that using munit_pytorch0.3.

mingyuliutw · 2018-07-27T23:51:26Z

Speed issue is now fixed in commit f972e42.

Cuky88 · 2018-07-28T11:02:57Z

After making the changes you did, I get the following error when resuming the training:

Traceback (most recent call last):
  File "train.py", line 64, in <module>
    iterations = trainer.resume(checkpoint_directory, hyperparameters=config) if opts.resume else 0
  File "/devel/MUNIT-master/MUNIT-master/trainer.py", line 186, in resume
    self.gen_a.load_state_dict(state_dict['a'])
  File "/opt/anaconda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for AdaINGen:
        Unexpected key(s) in state_dict: "enc_content.model.0.norm.running_mean", "enc_content.model.0.norm.running_var", "enc_content.model.1.norm.running_mean", "enc_content.model.1.norm.running_var", "enc_content.model.2.norm.running_mean", "enc_content.model.2.norm.running_var", "enc_content.model.3.model.0.model.0.norm.running_mean", "enc_content.model.3.model.0.model.0.norm.running_var", "enc_content.model.3.model.0.model.1.norm.running_mean", "enc_content.model.3.model.0.model.1.norm.running_var", "enc_content.model.3.model.1.model.0.norm.running_mean", "enc_content.model.3.model.1.model.0.norm.running_var", "enc_content.model.3.model.1.model.1.norm.running_mean", "enc_content.model.3.model.1.model.1.norm.running_var", "enc_content.model.3.model.2.model.0.norm.running_mean", "enc_content.model.3.model.2.model.0.norm.running_var", "enc_content.model.3.model.2.model.1.norm.running_mean", "enc_content.model.3.model.2.model.1.norm.running_var", "enc_content.model.3.model.3.model.0.norm.running_mean", "enc_content.model.3.model.3.model.0.norm.running_var", "enc_content.model.3.model.3.model.1.norm.running_mean", "enc_content.model.3.model.3.model.1.norm.running_var".

How can I use already trained model with this modification? Training from scratch is working fine.
I guess this issue comes from the changed Layer Normalization?

Do you have any idea why output quality is degraded?

mingyuliutw · 2018-07-28T14:12:47Z

@Cuky88 The degraded performance resulted from migrating to pytorch 0.4 is likely caused by the instance normalization parameter. We accidentally set track_running_stats=True in networks.py. This means that it will use the tracked means and vars in the test time. However, this is NOT what we used when we developed the code. In the new commit we have set this argument to false. I think this would resolve the issue. I am verifying the hypothesis. Once it is verified, I will add more details.

qilimk · 2018-07-31T15:48:27Z

@mingyuliutw I trained this model for 200,000 iterations several days ago and it took almost 4days. Your work looks so good and I really want to reproduce the results.

How many images in the training set should be a good choice?
How long should I expect it takes when training this model 1M iterations by using new code?

My GPU is Tesla V100-SXM2 16g.

qilimk · 2018-08-01T15:43:00Z

@Cuky88 I used 2500 images as training set and the results looked not so good apparently. I am trying a new dataset which has 50,000 images, hope to get a better result.
How about your results? It looks like your iterations are small.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with migrating to pytorch 0.4 #29

Issue with migrating to pytorch 0.4 #29

mingyuliutw commented Jul 27, 2018

mingyuliutw commented Jul 27, 2018

Cuky88 commented Jul 28, 2018 •

edited

Loading

mingyuliutw commented Jul 28, 2018

qilimk commented Jul 31, 2018

qilimk commented Aug 1, 2018

Issue with migrating to pytorch 0.4 #29

Issue with migrating to pytorch 0.4 #29

Comments

mingyuliutw commented Jul 27, 2018

mingyuliutw commented Jul 27, 2018

Cuky88 commented Jul 28, 2018 • edited Loading

mingyuliutw commented Jul 28, 2018

qilimk commented Jul 31, 2018

qilimk commented Aug 1, 2018

Cuky88 commented Jul 28, 2018 •

edited

Loading