Jetpack 4.4: Segmentation fault issue and slow inference time

Hi,

I just got the Xavier NX and installed with Jetpack 4.4.

I copied and ran my yolov3 project on NX, but I got a segmentation fault.

The project was pre-trained on a Linux PC and could run perfectly with 0.2s inference time.

Following is what I got.

$ ./yolov3_detect.py
Inference Time: 0:00:03.272489
Segmentation fault (core dumped)

$ gbd python3
(gdb) run yolov3_detect.py
Starting program: /usr/bin/python3 yolov3_detect.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f6171c1f0 (LWP 9075)]
[New Thread 0x7f60f1b1f0 (LWP 9076)]
[New Thread 0x7f5f71a1f0 (LWP 9077)]
[New Thread 0x7f5cf191f0 (LWP 9078)]
[New Thread 0x7f5b7181f0 (LWP 9079)]
[New Thread 0x7f3b40a1f0 (LWP 9080)]
[New Thread 0x7f3ac091f0 (LWP 9081)]
[New Thread 0x7f3a4081f0 (LWP 9082)]
[New Thread 0x7f39c071f0 (LWP 9083)]
[New Thread 0x7f394061f0 (LWP 9084)]
[New Thread 0x7f42e551f0 (LWP 9085)]
Inference Time: 0:00:03.155869

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000007f19e1e280 in ?? ()
   from /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8
(gdb) bt
#0  0x0000007f19e1e280 in ?? ()
   from /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8
#1  0x0000007f19e3b8f0 in ?? ()
   from /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8
Backtrace stopped: not enough registers or memory available to unwind further

It runs much slower and shows a segmentation fault.

Jetpack: 4.4
pytorch: v1.1.0 (for python 3.6)
cuda: 10.2
cudnn: 8.0

How to solve this issue?
Thank you!

Hi,

We try to reproduce this issue on our environment.
Does this issue also occur with the default darknet based YOLOv3 model?

Thanks.

Hi,

Sorry that I have not tried default darknet on Xavier NX.

I just got it and tried runing a model which was pre-trained on a Linux PC.

Thanks!

Hi,

Would you mind to share the source and model with us so we can check it further?
More, does your source support ARM environment?

Thanks.

Hi,

Can I have your email address so I can send the source and model to you?

I did not expect the architecture issue. I trained the model by pytorch 1.4 on Linux x86_64. Is that why it shows the segmentation fault?

Thanks.

Hi mliu82,

You can send it through private message function.

Hi all,

The issue has been solved.
The cause of the issue is that, the model is trained by Pytorch on another Linux PC and it need to be optimized by TensorRT before running on Xavier NX. As a beginner, I did not optimize my YOLOv3 model perfectly so it shows a segmentation fault and slow inference time.

Thank you all for responding!