Getting CUDA Error while runnin inference in Jetson Nano 2GB

I trained a detection model on custom data and converted it to an .onnx version. Data size of images are 640x480.
While converting it to on[TRT] …/rtSafe/cuda/genericReformat.cu (1294) - Cuda Error in executeMemcpy: 1 (invalid argument)
[TRT] FAILED_EXECUTION: std::exception
[TRT] failed to execute TensorRT context on device GPUnx it set width and height as 300.
While executing this getting the following Error
“”
Network retrained was ssd-mobilenet
followed the instructions in Hello AI World Video tutorial.
As from other forum topic it seems can be a pointer issue. Can anyone kindly help?
Thanks & Regards,
Dipankar Sil

Hi,

Since Nano 2GB has limited resources, would you mind checking if this error occurs due to out of memory?
You can get this information via monitoring the device with tegrastats at the same time.

If the memory doesn’t reach its maximal, could you share the onnx model with us for further investigating?
Thanks.

Hi AastaLL,
I checked as per your recommendation, the memory usage doesn’t max out. 300MB of RAM and approx 4GB of swap is empty. Here’s the output of tegra stats while executing the model on Jetson Nano 2GB, also attaching the onnx file for further help,mb1-ssd-Epoch-9-Loss-inf.onnx (25.9 MB)
“RAM 1645/1972MB (lfb 14x4MB) SWAP 147/5082MB (cached 2MB) CPU [25%@1428,27%@1428,25%@1428,23%@1428] EMC_FREQ 0% GR3D_FREQ 60% PLL@30C CPU@33C PMIC@100C GPU@32C AO@38.5C thermal@32.5C
RAM 1645/1972MB (lfb 14x4MB) SWAP 147/5082MB (cached 2MB) CPU [16%@1479,29%@1479,26%@1479,25%@1479] EMC_FREQ 0% GR3D_FREQ 68% PLL@30.5C CPU@33C PMIC@100C GPU@32C AO@38.5C thermal@32.25C
RAM 1646/1972MB (lfb 14x4MB) SWAP 147/5082MB (cached 2MB) CPU [16%@1479,28%@1479,31%@1479,28%@1479] EMC_FREQ 0% GR3D_FREQ 16% PLL@30.5C CPU@33C PMIC@100C GPU@32C AO@38.5C thermal@32.5C”

Hi @dipankar123sil, it seems there was some issue during training, because this model has infinite loss. To start, you might want to try training it with a lower learning rate to keep the gradients from exploding and loss from overflowing. You may also want to limit the training at first to an easier subset of your dataset to verify the model is working first.

1 Like