Explanation of the custom model training on Jetson Nano

Hi

I have collected my own dataset using camera capture tool. I have also started the training with batch size as 1 and 200 epochs. I have 40 images as training dataset and 10 images each for testing and validation. On terminal during training process, it shows me different type of losses, I want to understand what these loses means and how can I am improve it:

For ex:

Epoch 141, Step: 10/11, Avg Loss: 2.4043, Avg Regression Loss: 0.657, Avg Classification Loss: 0.5467
Epoch 141, Validation Loss: 3.766, Validation Regression Loss: 0.7865, Validation Classification Loss: 2.364

Can anyone explain what all these losses means and how can I decrease the loss as close to 0.0 so that model is accurate in detection. Do I need to increase the images in dataset?

I have also noticed, no matter how long you run your training the loss doesn’t come closer to 0. I have noticed at some epochs it shows around 2.5 and then also decreases to 2.0 but then in next epochs it increase back to 2.5, 2.7.

Ideally as per my understanding of training, when you keep on training your model, its loss should decrease and should not increase. Can anyone pleae explain this

Hi,

The explanation of loss depends on which library and implementation you use for training.
Do you run it with jetson-inference?

In general, 40 images is too small but this depends on the complexity of the scenario.
Since the model is adjusted based on the gradient direction, it’s possible that the loss of some epoch will go up.

Thanks.

Hi AastaLLL

Yes I am using Jetson Inference and using the train_ssd.py to train the model. How can I decrease the loss. Also just wanted to know if you can explain what batch size we should consider for training?

Thanks

Hi @ART97, since object detection tasks require that objects both be located correctly (i.e. the bounding box coordinates) and classified correctly (i.e. labelled), there are two losses for those purposes. The classification loss indicates the error the model has when recognizing objects, and the regression loss indicates the bounding box errors.

In general these losses should decrease over time, however it’s not necessarily realistic for it to reach zero. The value of the losses with respect to accuracy is dataset-dependent. So you may or may not find your losses to be acceptable already. As Aasta mentioned though, you may need more data in your dataset.

Regarding the batch sizes, it isn’t super important for the accuracy of the model, rather it speeds up training times. If you are training it on Jetson Nano, you can just use a batch size of 1 due to the limited memory. Increasing the batch size also increases the memory consumption.