I’m trying to train ssd mobilenet_v2 on my own dataset. it looks like the training is not stable. I’m keep getting “Invalid loss, terminating training”, usually within the first 8-10 epocs.
I’ve tried different batch size, 2,4,8 and 16 as suggested in a similar post but it did not resolve the problem.
I did managed to finish 80 epochs only once after many reruns.
Epoch 10/100 5015/5111 [============================>.] - ETA: 8s - loss: nan Batch 5014: Invalid loss, terminating training
ssd_train_mobilenet_v2_kitti.txt (1.8 KB)