-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing the Results, and Questions #7
Comments
Hi @drcdr , thank you for your interests in our work and so many good questions. I will try to answer a few of them and Xin will later put comments on some technical details. 1/3. Will be answered by Xin. The difference between 2.76% and 2.50% is a bit significant indeed.
5/6. BS seems an important issue in both speed and stability. We will try to make more experiments as soon as possible. Currently we only ran on our V100 GPUs and estimated the time on 1080Ti, which seems less accurate. Also, there are some evidences we met that suggests the importance of BS. We will provide some solutions for 12GB GPUs later. 4/7. For fair comparison, we did not change this setting. Another information is that we only need 1 day (1 V100) to train CIFAR10/100 on a searched architecture. Maybe Xin knows more about why you need 3 days. We are very welcome for your further questions and comments. |
Hi@drcdr, thanks for the comments. The following are some technical details of our experiments.
|
Hi @198808xc, @chenxin061 - thanks for the great, detailed responses. Based on this, I'll do some more investigation on my side, it may take a week or two - then, I'll follow up with what I find. Thanks |
I have a related question: how are you calculating the number of trainable parameters in the model? I wrote a quick utility, and I matched your 3.4M number for PDARTS when --auxiliary is False, but I get a higher number (3.91M) when --auxiliary is True:
|
@drcdr Yes if the auxiliary tower is included, the parameter count will be larger. However, the auxiliary tower is used for network training instead of testing. Therefore we do not take those extra parameters into consideration for the testing phase. Actually, you will get the same test accuracy without |
well, for some reason PyTorch crashed at iteration #551, with --auxiliary. Trying to figure out if warm restarts can be easily implemented. Looks like just CosineAnnealingLR() and torch.optim.SGD() would be affected (as well as torch.load'ing the checkpoint, and setting up the model from the state_dict)? |
Yes, you can recover the training from the checkpoint saved in the |
@chenxin061 OK, here's an update (thanks for your feedback).
Results:
|
@drcdr I think the experimental results you got on evaluating CIFAR10 is acceptable.
|
Hi, @chenxin061 I'm reproducing your ImageNet results. I trained your model based on DARTS codes, here is my training log and model file: https://drive.google.com/open?id=1br4IPnHCV-zUHJkEGXPwXnsl6288yhFy , while the final accuracy is 73.92%. I double checked our codes, the difference is that you use the cosine decayed LR scheduler, while I use the StepLR following DARTS. I use batch size of 256, start LR from 0.1, and 8 GPUs. While you use 1024 batch size and start LR as 0.5. Did you try to train your model with StepLR scheduler, and how is the performance? |
@D-X-Y I haven't tried Imagenet training yet. Am I reading/understanding this right; did your 250 epoch Imagenet training take 11 days, using 8 GPUs?! Also, it looked like you used the PDARTS genotype, so I guess you were trying to see how your run compared to the 24.4% top-1 test error number? (Also, I guess your batch-size-per-GPU was only 32?) |
@D-X-Y We did not try the StepLR scheduler for the PDARTS genotype. The results reported in our paper were obtained with the linear scheduler, and we also obtained similar test accuracy with cosine scheduler. We are re-training the DARTS genotype with linear and cosine scheduler and will later report the test accuracy here and in the next version of our paper. |
@drcdr Yes, 8 GPUs, batch-size-per-GPU was 32. I'm trying to get 24.4% top-1 test error. @chenxin061 Thanks for your reply and also look forward to your results. I will also try DARTS using your training strategy after NIPS ddl :) |
@chenxin061 Thanks for your results! I'm also training DARTS and other NAS models with cosine scheduler. |
I got acc95.95% using PDARTS in genotypes.py without change anything. (GPU:Tesla_V100-SXM2-32G) |
@Margrate Maybe you missed option terms |
I run it again by adding option term --cutout and --auxiliary. Just got acc 97.01% |
I think there must be some hidden difference. The expected valid acc is about 97.50 with the correct setting. You can also refer to issue #9, where the retraining valid_acc reported in the issue reached 97.52% at epoch 557. |
It seems the genotype PDARTS in this line is different than the one reported in figure 3(c). Can you confirm that that the released genotype (above) was giving you 97.5%? |
@arash-vahdat For me, see the PDARTSAux96 line in the table above (from May16). My final error there was 2.56%, and the genotype that I was using was the following, which looks the same as what you are referencing:
|
@drcdr Thanks for the reproduction. |
I am trying to reproduce the results of PDARTS, which looks like it provides awesome performance, congratulations!
Everything here is CIFAR-10. I didn't make any significant source-code mods; all other arguments are the default based on the repository on Apr 30. (I did hard-code directory names.)
Here are the labels for what I ran (Windows-10, Pytorch-nightly from 4/30/2019, 2xTitanXP):
1) PDARTS: Just train, rerunning the (default) PDARTS genotype in genotypes.py:
python train_cifar.py --cutout
2) pdarts-BS64: Search and train, but using Batch-Size=64 since TitanXP is memory-limited.
python train_search.py --add_layers 6 --add_layers 12 --dropout_rate 0.1 --dropout_rate 0.4 --dropout_rate 0.7 --batch_size 64
pdarts64 = Genotype(normal=[('skip_connect', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 2), ('skip_connect', 0), ('dil_conv_5x5', 4)], normal_concat=range(2, 6), reduce=[('avg_pool_3x3', 0), ('avg_pool_3x3', 1), ('skip_connect', 1), ('sep_conv_5x5', 2), ('avg_pool_3x3', 0), ('dil_conv_3x3', 2), ('avg_pool_3x3', 0), ('dil_conv_3x3', 3)], reduce_concat=range(2, 6))
python train_cifar.py --arch=pdarts64 --cutout
Some Questions
Well, that's enough questions for now, I appreciate your time and consideration.
For reference, here is a plot of the learning rate and validation error for these two runs. The bold line is the result of
filtfilt
with a window filter of length 25.The text was updated successfully, but these errors were encountered: