-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about domain adaptation BatchNormalization #5
Comments
Yes, well, kind of. |
Hi @Britefury , I'm wondering, so you end up only using the teacher model for testing the data in the new domain, right? since only the teacher model stores the mean+std of the data that we want the model to generalize to. |
Hi @YilinLiu97, thats a good question. Lets see. The batch-norm layers in both models will maintain a running mean and variance. You noted in Issue #6 that my My new code that I use in my current experiments is now this:
|
@Britefury , I tried it on my own experiments (with deep copied weights at the beginning) and the ema model doesn't do well on target data (!?). It acted like how a model will do when it's first initialized, which is really weird. Even with alpha=0, and I checked that the ema model has the same weights as those of the model and the only difference is the running_mean/std which is expected. I expect that the teacher model will do well at least in the target domain with its own running_mean/std, but it didn't. I think that the initialization of the EMA shouldn't matter that much as long as the alpha slowly ramps up just like the consistency loss (so that the teacher model quickly forgets the early weights), and this is what I've done in my own experiments, but the results were described above. Since the alpha was set to be 0.99 also in the original implementation of Mean teacher (and they didn't seem to deep copy the weights at the beginning), I do wonder how could the teacher model lead to good performance. |
Hi @Britefury, I'm wondering, is ema working well in your experiment? I mean, using the EMA model only for testing either source data or target data. Thanks! |
The only experiments I have tried are with the buggy EMA implementation that is in this codebase. So it may be that fixing it breaks things. The new 'fixed' EMA code I pasted above is used in other newer experiments, but not these. I suppose I need to do a comparison! |
Okay. I've made a modified version of the code (not putting it online yet) and I'm going to compare the EMA implementations and see what difference I get. I should have an answer in several days; its running on an old GPU :) |
Thank you! Looking forward to the results! :) |
@YilinLiu97 I have run the experiments and using fixed EMA makes no statistical difference as far as I can tell, at least on the MNIST -> SVHN experiment. I re-ran the baseline just in case running it on a machine with a smaller GPU and less RAM with smaller batch size made a difference. The baseline accuracy is 96.998% +/- 0.056% |
the class DomainAdaptModule seems to maintain the different batch normalization parameters for source and target domain. But it seems that only _init_bn_layers function is called in the init and the save/restore the source/target bn function like bn_save_source is never called or appears in any other places.
Could you please help me understand how your code maintain the different source and target BN? And how they are used in the test phase?
Thank you ahead of time
The text was updated successfully, but these errors were encountered: