I recently trained a custom ssd mobilenet model. Below is how the training graph looks like
As we can see, there are lots of small peaks where the loss increase and then decreased. We can see 6-7 peaks like this. As per my understanding, the loss should keep on decreasing. Is there any reason for such graph?
Can you please explain how can I define step size or other optimizations? I am only using train_ssd.py to train. Although the model is performing fine but I just wanted to understand the training graph that’s why I posted the question.
Although admittedly I haven’t messed with these or run this for more than 100 epochs. I believe that may lead to overfitting your model (i.e. attaining the lowest possible loss on your training set doesn’t always generalize the better real-world performance)