Runtime error with Jetson Xavier NX

I have a project taking use of cuda/cudnn that runs well with Jetson Nano + carrier board A02. The same project is able to compile with Jetson Xavier NX + carrier board B01, however, it reports the following run time error:

Unhandled exception in worker thread: basic_string::_M_construct null not valid

I’m not sure where I should trace this issue. Here is the difference of the two sets:

=====Jetson Nano + Carrier Board A02=====
JP 4.2.3
CUDA Runtime version: 10.0.0
Cudnn version: 7.3.1
Latest version of CUDA supported by the driver: 10.0.0
GPU: NVIDIA Tegra X1
GPU memory: 3.87164 Gb
GPU clock frequency: 921.6 MHz
GPU compute capability: 5.3

===== Jetson Xavier NX + Carrier Board B01=====
JP 4.4
CUDA Runtime version: 10.2.0
Cudnn version: 8.0.0
Latest version of CUDA supported by the driver: 10.2.0
GPU: Xavier
GPU memory: 7.58946 Gb
GPU clock frequency: 1109 MHz
GPU compute capability: 7.2

Any advice is appreciated. Thanks.
Colin

Hi Colin,

Xavier’s GPU capacity is sm=72.
So you will need to add the correct architecture to generate a Xavier runnable kernel.
Could you check if the sm=72 compute capacity is added when compiling the app for Xavier NX?

Thank you kayccc. I added “-arch=compute_72 -code=sm_72” to nvcc but still got the same runtime error.

Hi kayccc, I wonder if Nvidia would be able to provide an option to downgrade cuda/cudnn. Since Jetson Xavier AGX can run with lower version of cuda/cudnn so I expect Xavier NX would have no problem.

Thanks.
Colin

Hi,

There are some dependencies between libraries and OS.
So you cannot just downgrade the CUDA version without reflashing.

Unhandled exception in worker thread: basic_string::_M_construct null not valid

The error looks like a basic C++ error, not from CUDA side.
A possible cause is that the g++/gcc version is upgraded and some limitation leads to the issue.

Could you check if this page helps first?

Thanks.
A po

Hi AastaLLL,

Thanks for your advice. I tried compiling with clang++6.0, g++7, and g++8, they all have the same run time error. But if I don’t run my neural networks on cuda/cudnn backend and choose to run on CPU only, it works just fine. That seems to point to cuda/cudnn runtime libraries that could have something not thread safe.

This is what I have so far:

It works on Nano with JP 4.3/CUDA10.0/CUDNN7.3
It doesn’t work on NX with JP4.4/CUDA10.2/CUDNN8.0
It works on NX with JP4.4 as long as not on CUDA/CUDNN backend

I’ll later try on cuda/cudnn backend using JP 4.3 with Xavier AGX see how it goes.

Thanks.
Colin

Just found a work-around for this issue. Here is the steps:

[1] Keep cuda-10.2 but force remove libcudnn8 and libcudnn8-dev
[2] Follow this link to install libcudnn7 for cuda-10.2


[3] Now my neural network is able to run on cuda10.2/cudnn7 backend.

Looks to me there may be something wrong with cudnn 8.0.

Thanks.
Colin

Hi,

Usually, we don’t recommend to use the package outside from sdkmanager.
Since that most CUDA related library have dependencies on GPU driver.

So this may limit you to use TensorRT, TensorFlow, deepstream, … . library.
Would you mind to share your source with us so we can check it further?

Thanks.