Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about trainning #23

Open
wanghxcis opened this issue Jun 12, 2018 · 10 comments
Open

question about trainning #23

wanghxcis opened this issue Jun 12, 2018 · 10 comments

Comments

@wanghxcis
Copy link

wanghxcis commented Jun 12, 2018

Hi guys,
I collect some data image about winter and summer, and resize to 256*256. When I train these image, I find the vgg_w parameter is hard to tune. If this value is large, the output image quality is OK, but I can only see little translation effect, the output image is almost the same as input. However, when this value is small, the output image is blurry after 1,000,000 iterations. What should I do, enlarging mlp_dim or somthing else?

@OValery16
Copy link

OValery16 commented Jul 13, 2018

From my understanding, the domain invariant perceptual loss should only be used for accelerating the training process for input >=512 X 512. My guess is that images of size 256 X 256 is to small.

Check page 8 of their paper which give more details.

Please the authors, can confirm my guess ?

Thanks,

@mingyuliutw
Copy link
Collaborator

@OValery16 Yes, we find that the domain invariant perceptual loss is useful for large size images. For image resolution of 256x256, we do not use domain invariant perceptual loss.

@mingyuliutw
Copy link
Collaborator

@wanghxcis Could you send me some example images and training results?

@OValery16
Copy link

OValery16 commented Jul 14, 2018

@mingyuliutw I read your paper and I got a bit confused. In which case do you use explicit style augmented cycle consistency loss ?

@wanghxcis
Copy link
Author

@mingyuliutw Thanks, I will get rid off the perceptual loss, and try again. My trainning parameters are as follows:
image_save_iter: 1000 # How often do you want to save output images during training
image_display_iter: 100 # How often do you want to display output images during training
display_size: 8 # How many images do you want to display each time
snapshot_save_iter: 10000 # How often do you want to save trained models
log_iter: 1 # How often do you want to log the training stats

max_iter: 1000000 # maximum number of training iterations
batch_size: 1 # batch size
weight_decay: 0.0001 # weight decay
beta1: 0.5 # Adam parameter
beta2: 0.999 # Adam parameter
init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal]
lr: 0.0001 # initial learning rate
lr_policy: step # learning rate scheduler
step_size: 100000 # how often to decay learning rate
gamma: 0.5 # how much to decay learning rate
gan_w: 1.5 # weight of adversarial loss
recon_x_w: 9 # weight of image reconstruction loss
recon_s_w: 1 # weight of style reconstruction loss
recon_c_w: 1 # weight of content reconstruction loss
recon_x_cyc_w: 1 # weight of explicit style augmented cycle consistency loss
vgg_w: 0.6 # weight of domain-invariant perceptual loss
gen:
dim: 64 # number of filters in the bottommost layer
mlp_dim: 256 # number of filters in MLP
style_dim: 8 # length of style code
activ: relu # activation function [relu/lrelu/prelu/selu/tanh]
n_downsample: 2 # number of downsampling layers in content encoder
n_res: 4 # number of residual blocks in content encoder/decoder
pad_type: reflect # padding type [zero/reflect]
dis:
dim: 64 # number of filters in the bottommost layer
norm: none # normalization layer [none/bn/in/ln]
activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh]
n_layer: 3 # number of layers in D
gan_type: lsgan # GAN loss [lsgan/nsgan]
num_scales: 2 # number of scales
pad_type: reflect # padding type [zero/reflect]

input_dim_a: 3 # number of image channels [1/3]
input_dim_b: 3 # number of image channels [1/3]
num_workers: 8 # number of data loading threads
new_size: 256 # first resize the shortest image side to this size
crop_image_height: 256 # random crop image of this height
crop_image_width: 256 # random crop image of this width
data_root: ./datasets/summer2winter_fineselect256/ # dataset folder location
Trainning examples
Result:
gen_a2b_train_current

@zhangmozhe
Copy link

Hi, does the domain-invariant perceptual loss affect the image quality for size 256*256?

@Lucky0775
Copy link

@mingyuliutw I read your paper and I got a bit confused. In which case do you use explicit style augmented cycle consistency loss ?

Hi,did you solve this problem? I also have the same confusion.

@wylblank
Copy link

wylblank commented Apr 9, 2024

@mingyuliutw Thanks, I will get rid off the perceptual loss, and try again. My trainning parameters are as follows: image_save_iter: 1000 # How often do you want to save output images during training image_display_iter: 100 # How often do you want to display output images during training display_size: 8 # How many images do you want to display each time snapshot_save_iter: 10000 # How often do you want to save trained models log_iter: 1 # How often do you want to log the training stats

max_iter: 1000000 # maximum number of training iterations batch_size: 1 # batch size weight_decay: 0.0001 # weight decay beta1: 0.5 # Adam parameter beta2: 0.999 # Adam parameter init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal] lr: 0.0001 # initial learning rate lr_policy: step # learning rate scheduler step_size: 100000 # how often to decay learning rate gamma: 0.5 # how much to decay learning rate gan_w: 1.5 # weight of adversarial loss recon_x_w: 9 # weight of image reconstruction loss recon_s_w: 1 # weight of style reconstruction loss recon_c_w: 1 # weight of content reconstruction loss recon_x_cyc_w: 1 # weight of explicit style augmented cycle consistency loss vgg_w: 0.6 # weight of domain-invariant perceptual loss gen: dim: 64 # number of filters in the bottommost layer mlp_dim: 256 # number of filters in MLP style_dim: 8 # length of style code activ: relu # activation function [relu/lrelu/prelu/selu/tanh] n_downsample: 2 # number of downsampling layers in content encoder n_res: 4 # number of residual blocks in content encoder/decoder pad_type: reflect # padding type [zero/reflect] dis: dim: 64 # number of filters in the bottommost layer norm: none # normalization layer [none/bn/in/ln] activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh] n_layer: 3 # number of layers in D gan_type: lsgan # GAN loss [lsgan/nsgan] num_scales: 2 # number of scales pad_type: reflect # padding type [zero/reflect]

input_dim_a: 3 # number of image channels [1/3] input_dim_b: 3 # number of image channels [1/3] num_workers: 8 # number of data loading threads new_size: 256 # first resize the shortest image side to this size crop_image_height: 256 # random crop image of this height crop_image_width: 256 # random crop image of this width data_root: ./datasets/summer2winter_fineselect256/ # dataset folder location Trainning examples Result: gen_a2b_train_current

您好,我也遇到了相同的问题,请问您是怎么解决的?
Hello, I also met the same problem, how did you solve it?

@wylblank
Copy link

wylblank commented Apr 9, 2024

Hi guys, I collect some data image about winter and summer, and resize to 256*256. When I train these image, I find the vgg_w parameter is hard to tune. If this value is large, the output image quality is OK, but I can only see little translation effect, the output image is almost the same as input. However, when this value is small, the output image is blurry after 1,000,000 iterations. What should I do, enlarging mlp_dim or somthing else?

您好,我也是将图像调整为256256,但在这种情况下如果vgg_w不为0,则会使损失为Nan,请问您有什么建议吗?感谢!
Hello, I also adjust the image to 256
256, but in this case, if vgg_w is not 0, the loss will be Nan, do you have any suggestions? Thank you very much!

@wylblank
Copy link

Hi guys, I collect some data image about winter and summer, and resize to 256*256. When I train these image, I find the vgg_w parameter is hard to tune. If this value is large, the output image quality is OK, but I can only see little translation effect, the output image is almost the same as input. However, when this value is small, the output image is blurry after 1,000,000 iterations. What should I do, enlarging mlp_dim or somthing else?

您好,我正在尝试复现MUNIT,但遇到了 for (src, dst) in zip(vgglua.parameters()[0], vgg.parameters()): TypeError: 'NoneType' object is not callable的错误,我尝试进行了修改但生成的vgg16.weight不起作用,vgg_w参数为nan,请问您是否遇到过?或者可以发我一下您models文件夹下的vgg16.weight给我一下吗?万分感谢
Hello, I am trying to reproduce MUNIT, but encountered for (src, dst) in zip(vgglua.parameters()[0], vgg.parameters()): TypeError: 'NoneType' object is not a callable error, I tried to modify it, but the generated vgg16.weight does not work, and the vgg_w parameter is nan. Have you ever encountered it? Or could you please send me vgg16.weight under your models folder? Thanks a million

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants