You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I and my partner had got the same problem. We tried to train the R101 network after uncommenting rpn. It is working (our present iteration number is 31K+). We agree that it is different from the CVPR VCRCNN paper's training method. We think the backbone would not be trained well after removing RPN. We may be wrong.
I use
CUDA_VISIBLE_DEVICES=2 python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" --skip-test SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000
this command to follow your instruction and I use coco 2017 train and val data.
While training, the loss keeps around 8 and did not drop.
after 6000 steps, the model spits nan loss.
do you have any idea why nan loss is coming?
What is the problem?
The text was updated successfully, but these errors were encountered: