Setup information:
• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7.1.3
• Issue Type( questions, new requirements, bugs) Question
Hello,
I have set up a Deepstream application that is using a Yolo model for inference (using marcoslucianops implementation that can be found here: GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.1 / 6.0.1 / 6.0 configuration for YOLO models).
This application is running in a deepstream docker: nvcr.io/nvidia/deepstream-l4t, version 5.0.1-20.09-samples.
In general, this application runs fine on multiple Jetson Nano devices with the same setup.
However, we have a Jetson device that does not succeed in loading in the Deepstream inference model.
When the application is started and the TensorRT engine file is being loaded into the nvinfer module, a segmentation fault occurs as follows:
Thread 1 "application" received signal SIGSEGV, Segmentation fault.
0x0000007fb7fdc1d4 in elf_machine_rela_relative (reloc_addr_arg=0x7f52a34000, reloc=0x7f5ef2c000, l_addr=546847277056) at ../sysdeps/aarch64/dl-machine.h:376
376 ../sysdeps/aarch64/dl-machine.h: No such file or directory.
gdb backtrace shows the following:
#0 0x0000007fb7fdc1d4 in elf_machine_rela_relative (reloc_addr_arg=0x7f52f25000, reloc=0x7f5f41d000, l_addr=546852458496) at ../sysdeps/aarch64/dl-machine.h:376
#1 0x0000007fb7fdc1d4 in elf_dynamic_do_Rela (skip_ifunc=0, lazy=0, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=0x5578371de0) at do-rel.h:112
#2 0x0000007fb7fdc1d4 in _dl_relocate_object (scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:258
#3 0x0000007fb7fe2a1c in dl_open_worker (a=0x7fffffa398) at dl-open.c:382
#4 0x0000007fb7728694 in __GI__dl_catch_exception (exception=0xfffffffffffffffe, operate=0x7fffffa1bc, args=0x7fffffa380) at dl-error-skeleton.c:196
#5 0x0000007fb7fe2418 in _dl_open (file=0x7fb003ca80 "libcudnn_cnn_infer.so.8", mode=-2147483646, caller_dlopen=0x7fb002cc24 <cudnnCreateConvolutionDescriptor+156>, nsid=-2, argc=1, argv=0x7ffffff1f8, env=<optimized out>) at dl-open.c:605
#6 0x0000007fb75f5014 in dlopen_doit (a=0x7fffffa658) at dlopen.c:66
#7 0x0000007fb7728694 in __GI__dl_catch_exception (exception=0x7fb7ffe7a8 <__stack_chk_guard>, exception@entry=0x7fffffa5f0, operate=0x7fffffa44c, args=0x7fffffa5d0) at dl-error-skeleton.c:196
#8 0x0000007fb7728738 in __GI__dl_catch_error (objname=0x555589f400, errstring=0x555589f408, mallocedp=0x555589f3f8, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:215
#9 0x0000007fb75f6780 in _dlerror_run (operate=operate@entry=0x7fb75f4fb0 <dlopen_doit>, args=0x7fffffa658, args@entry=0x7fffffa668) at dlerror.c:162
#10 0x0000007fb75f50e8 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#11 0x0000007fb002cc24 in cudnnCreateConvolutionDescriptor () at /usr/lib/aarch64-linux-gnu/libcudnn.so.8
#12 0x0000007f92f5e028 in nvinfer1::rt::cuda::CudnnConvolutionRunner::allocateContextResources(nvinfer1::rt::CommonContext const&, nvinfer1::rt::ExecutionParameters&) ()
at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#13 0x0000007f92f1eb14 in nvinfer1::rt::SafeExecutionContext::setDeviceMemoryInternal(void*) () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#14 0x0000007f92f23f78 in nvinfer1::rt::SafeExecutionContext::SafeExecutionContext(nvinfer1::rt::SafeEngine const&, bool) () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#15 0x0000007f92ca9614 in nvinfer1::rt::ExecutionContext::ExecutionContext(nvinfer1::rt::Engine const&, bool) () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#16 0x0000007f92ca98a0 in nvinfer1::rt::Engine::createExecutionContext() () at /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
#17 0x0000007fb041ea94 in () at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infer.so
#18 0x0000007fb03fd45c in () at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infer.so
#19 0x0000007fb03fdef0 in () at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infer.so
#20 0x0000007fb03ff6cc in () at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infer.so
#21 0x0000007fb04000a0 in createNvDsInferContext(INvDsInferContext**, _NvDsInferContextInitParams&, void*, void (*)(INvDsInferContext*, unsigned int, NvDsInferLogLevel, char const*, void*)) ()
at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infer.so
#22 0x0000007fb07829c4 in () at /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#23 0x0000007fb7258224 in () at /usr/lib/aarch64-linux-gnu/libgstbase-1.0.so.0
The Jetson Nano device uses the same docker image that works fine on other Jetson Nano’s that are set up in the same way (flashed with the same Jetpack), so it seems to be installation dependent.
TensorRT version is 7.1.3.
Cuda version is 10.2.
Cudnn version is 8.0.
Looking at the backtrace, it looks like an issue is occurring in cudnnCreateConvolutionDescriptor ().
Do you have a suggestion on how we can resolve this issue?
Thank you for your time.