SSD - mobilenet_v2 Invalid loss, terminating training

Hi
I’m trying to train ssd mobilenet_v2 on my own dataset. it looks like the training is not stable. I’m keep getting “Invalid loss, terminating training”, usually within the first 8-10 epocs.
I’ve tried different batch size, 2,4,8 and 16 as suggested in a similar post but it did not resolve the problem.
I did managed to finish 80 epochs only once after many reruns.

Epoch 10/100
5015/5111 [============================>.] - ETA: 8s - loss: nan           Batch 5014: Invalid loss, terminating training

ssd_train_mobilenet_v2_kitti.txt (1.8 KB)

Any ideas?
Thanks

1 Like

Please finetune the max_learning_rate along with bs too. For example, try

max_learning_rate: 1e-2

it seems that the problem was solved. thanks!