- Author:
- Mingxing Tan (Google Brain)
- Quoc V. Le (Google Brain)
- Paper Link
- By using a multi scale kernel size, performance improvements and efficiency were obtained.
- Each kernel size has a different receptive field, so we can get different feature maps for each kernel size.
Datasets | Model | Acc1 | Acc5 | Parameters (My Model, Paper Model) |
---|---|---|---|---|
CIFAR-10 | MixNet-s (WORK IN PROCESS) | 92.82% | 99.79% | 2.6M, - |
CIFAR-10 | MixNet-m (WORK IN PROCESS) | 92.52% | 99.78% | 3.5M, - |
CIFAR-10 | MixNet-l (WORK IN PROCESS) | 92.72% | 99.79% | 5.8M, - |
IMAGENET | MixNet-s (WORK IN PROCESS) | 4.1M, 4.1M | ||
IMAGENET | MixNet-m (WORK IN PROCESS) | 5.0M, 5.0M | ||
IMAGENET | MixNet-l (WORK IN PROCESS) | 7.3M, 7.3M |
python main.py
-
--data
(str): the ImageNet dataset path -
--dataset
(str): dataset name, (example: CIFAR10, CIFAR100, MNIST, IMAGENET) -
--batch-size
(int) -
--num-workers
(int) -
--epochs
(int) -
--lr
(float): learning rate -
--momentum
(float): momentum -
--weight-decay
(float): weight dacay -
--print-interval
(int): training log print cycle -
--cuda
(bool) -
--pretrained-model
(bool): hether to use the pretrained model
- Distributed SGD
- ImageNet experiment