Hello, I used batchsize2048 in epoch0-13, batchsize1024 in epoch13-22 and batchsize512 in epoch22-24, but the accuracy was only 59.12%. Then I continue to train epoch25, but the accuracy comes to 56%, it seems that overfitting has occurred #256

leo23ui · 2024-12-25T06:01:44Z

Hello, I used batchsize2048 in epoch0-13, batchsize1024 in epoch13-22 and batchsize512 in epoch22-24, but the accuracy was only 59.12%. Then I continue to train epoch25, but the accuracy comes to 56%, it seems that overfitting has occurred, how should I get the accuracy of 63.5? Should I use batchsize512 from epoch 0. hope to get your advice, thanks very much!!!

export NNODES=1
export GPUS_PER_NODE=1
export WANDB__SERVICE_WAIT=60
export CUDA_VISIBLE_DEVICES=5

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py
--save-frequency 1
--report-to wandb
--train-data /home/gg/gg/MQBench-main/test/model/e1/split_2tar
--dataset-type webdataset
--imagenet-val ./ImageNet
--warmup 2000
--batch-size 512
--epochs 25
--workers 16
--model TinyCLIP-ViT-39M-16-Text-19M
--name exp_name
--seed 0
--local-loss
--grad-checkpointing
--output ./outputs/bb
--lr 0.0001
--gather-with-grad
--pretrained-image-file ViT-B-16@openai
--pretrained-text-file ViT-B-16@openai
--distillation-teacher ViT-B-32@laion2b_e16
--norm_gradient_clip 5
--train-num-samples 15000000
--logit-scale 50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hello, I used batchsize2048 in epoch0-13, batchsize1024 in epoch13-22 and batchsize512 in epoch22-24, but the accuracy was only 59.12%. Then I continue to train epoch25, but the accuracy comes to 56%, it seems that overfitting has occurred #256

Hello, I used batchsize2048 in epoch0-13, batchsize1024 in epoch13-22 and batchsize512 in epoch22-24, but the accuracy was only 59.12%. Then I continue to train epoch25, but the accuracy comes to 56%, it seems that overfitting has occurred #256

leo23ui commented Dec 25, 2024

Hello, I used batchsize2048 in epoch0-13, batchsize1024 in epoch13-22 and batchsize512 in epoch22-24, but the accuracy was only 59.12%. Then I continue to train epoch25, but the accuracy comes to 56%, it seems that overfitting has occurred #256

Hello, I used batchsize2048 in epoch0-13, batchsize1024 in epoch13-22 and batchsize512 in epoch22-24, but the accuracy was only 59.12%. Then I continue to train epoch25, but the accuracy comes to 56%, it seems that overfitting has occurred #256

Comments

leo23ui commented Dec 25, 2024