Error training with jetson-inference

user122459 · April 2, 2022, 1:17am

Hello,
I am trying to train with jetson-inference example and exporting the model but I get this error:

I noticed that there’s something weird happening as the avg loss, classification and regression are output-ing as “nan” halfway through the epoch. Not sure why this is happening.

Please help

dusty_nv · April 4, 2022, 7:00pm

Hi @user122459, normally onnx_export.py will select your best model (in the case of SSD models, the one with the lowest loss in the filename), however since the loss is NaN it is unable to do this. So you can run it manually like so:

$ python3 onnx_export.py --input=models/detections/mb1-ssd-Epoch-0-Loss-nan.pth --labels=models/detections/labels.txt --output=models/detections/ssd-mobilenet.onnx

However, this issue with the inf/nan losses will mean that your model is unlikely to detect your objects correctly. Typically you want to debug which item(s) in your training dataset are causing the inf/nan loss. To do this, I recommend uncommenting this line of code:

https://github.com/dusty-nv/pytorch-ssd/blob/3f9ba554e33260c8c493a927d7c4fdaa3f388e72/vision/datasets/voc_dataset.py#L76

And then running train_ssd.py with the options --batch-size=1 --workers=1 --debug-steps=1
Then the image ID that gets printed out directly before the inf/nan loss is the one that is causing the issue.
Then you can drill down and inspect that image’s XML file to see if anything is awry (or remove it)

system · April 27, 2022, 6:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Train_ssh.py only works with one dataset; other one returns Loss: nan Jetson Nano ai-training	4	617	October 15, 2021
Error training and converting to onnx with custom dataset Jetson Nano ai-training , nano2gb	12	1267	October 15, 2021
Error in python train_ssd.py Jetson Nano ai-training	7	820	January 18, 2022
Dimension mismatch error in Jetson-inference ssd model training Jetson Nano ssd , training	2	1190	February 2, 2022
Python onnx_export.py shows error while trying to export model, please help Jetson Nano onnx	9	1207	January 11, 2022
Explanation of the custom model training on Jetson Nano Jetson Nano ai-training	4	1287	January 26, 2022
Issue with Setting Resolution to 512 in Jetson Inference for ONNX Export Jetson AGX Orin onnx	8	37	January 9, 2025
Onnx_export.py outputs size mismatch for classification_headers.0.weight / bias errors Jetson Xavier NX jetson-inference	2	1895	October 18, 2021
How train jetson-inference ssd512 model Jetson TX2 jetson-inference , ssd , pytorch	14	3030	October 18, 2021
How do I re-train my model? Jetson Nano jetson-inference , ai-training	6	1501	August 29, 2021

Error training with jetson-inference

Related topics