I need some help with debugging Gstreamer blacklisting Deepstream plugins in my docker container. I’m not running on JetPack, but rather on an OS called Balena OS. I believe the issue is not related to the OS, but rather the installation.
The OS is built on L4T 32.6.1, and I’m using a barebone TX2NX base image provided by Balena. The top layers of the image are shown below. We are using cuda 10.2, tensorrt 8.0.1 and deepstream 6.0.0.
The installation seems straightforward, and is very similar to the installation on Jetpack 4.6, apart from the BSP download and applying debs manually.
For some reason, gstreamer is blacklisting some of the deepstream plugins. I’ve manually check if the dependencies for the shared library exist by using ldd, and they all shows up fine. I also checked the output of ldconfig -p to ensure the gstreamer plugins exist, and they do. I’ve attach the output when doing GST_DEBUG=4 gst-inspect-1.0 -b after removing the cache file in /root/.cache/gstreamer-1.0. inspect-output.txt (151.5 KB)
There’s a few stderr out regarding nvbuf_utils: Could not get EGL display connection, but the DISPLAY env var is already unset, and this device does not have any display output as well.
Any help pointing towards where to look into fixing this issue would be appreciated. Thanks!
I also try to use gdb to see whats’ causing the segfault as well, it seems like it’s missing something called pthread_mutex_lock.c?
root@6caf369:/# gdb /usr/bin/gst-inspect-1.0
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/gst-inspect-1.0...(no debugging symbols found)...done.
(gdb) set args /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
(gdb) run
Starting program: /usr/bin/gst-inspect-1.0 /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
nvbufsurftransform: Could not get EGL display connection
Program received signal SIGSEGV, Segmentation fault.
0x0000007fb7ac26e4 in __GI___pthread_mutex_lock (mutex=0x70404030103010e) at pthread_mutex_lock.c:67
67 pthread_mutex_lock.c: No such file or directory.
Here’s the output of valgrind from running valgrind gst-inspect-1.0 /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
Segfaults and coredumps seems to be coming from libEGL_mesa. Still investigating more
root@6caf369:/# valgrind gst-inspect-1.0 /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
==888== Memcheck, a memory error detector
==888== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==888== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==888== Command: gst-inspect-1.0 /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
==888==
nvbufsurftransform: Could not get EGL display connection
nvbufsurftransform: Could not get EGL display connection
==888== Warning: set address range perms: large range [0x100000000, 0x1ef690000) (noaccess)
==888== Warning: set address range perms: large range [0xf00000000, 0xfef690000) (noaccess)
==888== Warning: set address range perms: large range [0x8ce9000, 0x28ce8000) (noaccess)
==888== Syscall param ioctl(generic) points to uninitialised byte(s)
==888== at 0x4BD860C: ioctl (ioctl.S:26)
==888== by 0x8C5A2FB: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so)
==888== by 0x8C547DB: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so)
==888== by 0x7DF6D57: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==888== by 0x7CBB143: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==888== by 0x7D46DAF: cuDeviceGetAttribute (in /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1)
==888== by 0x6267A1B: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so.1.0.0)
==888== Address 0x1ffeffe2dc is on thread 1's stack
==888==
nvbufsurftransform: Could not get EGL display connection
==888== Warning: noted but unhandled ioctl 0x4e04 with no size/direction hints.
==888== This could cause spurious value errors to appear.
==888== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==888== Invalid read of size 8
==888== at 0x8D08310: ??? (in /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0)
==888== Address 0x5ef3888 is 8 bytes inside a block of size 48 free'd
==888== at 0x4846D58: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888== Block was alloc'd at
==888== at 0x4847B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888==
==888== Invalid write of size 8
==888== at 0x8D07FFC: ??? (in /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0)
==888== Address 0x5ef38a0 is 32 bytes inside a block of size 48 free'd
==888== at 0x4846D58: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888== Block was alloc'd at
==888== at 0x4847B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888==
==888== Invalid read of size 8
==888== at 0x8D08040: ??? (in /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0)
==888== Address 0x5ef3898 is 24 bytes inside a block of size 48 free'd
==888== at 0x4846D58: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888== Block was alloc'd at
==888== at 0x4847B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888==
==888== Invalid write of size 8
==888== at 0x8D08048: ??? (in /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0)
==888== Address 0x5ef38a8 is 40 bytes inside a block of size 48 free'd
==888== at 0x4846D58: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888== Block was alloc'd at
==888== at 0x4847B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888==
==888== Invalid free() / delete / delete[] / realloc()
==888== at 0x4846D58: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888== Address 0x5ef3880 is 0 bytes inside a block of size 48 free'd
==888== at 0x4846D58: free (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888== Block was alloc'd at
==888== at 0x4847B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==888==
==888==
==888== HEAP SUMMARY:
==888== in use at exit: 2,206,534 bytes in 23,835 blocks
==888== total heap usage: 89,539 allocs, 65,705 frees, 8,925,413 bytes allocated
==888==
==888== LEAK SUMMARY:
==888== definitely lost: 18,474 bytes in 3 blocks
==888== indirectly lost: 0 bytes in 0 blocks
==888== possibly lost: 6,924 bytes in 81 blocks
==888== still reachable: 2,092,712 bytes in 23,459 blocks
==888== of which reachable via heuristic:
==888== length64 : 80 bytes in 2 blocks
==888== newarray : 1,552 bytes in 17 blocks
==888== suppressed: 0 bytes in 0 blocks
==888== Rerun with --leak-check=full to see details of leaked memory
==888==
==888== For counts of detected and suppressed errors, rerun with: -v
==888== Use --track-origins=yes to see where uninitialised values come from
==888== ERROR SUMMARY: 10 errors from 6 contexts (suppressed: 0 from 0)
I was able to solve the issue, the issue was not a missing library, but rather tegra-egl path is not in the ldconfig configs.
I updated the BSP installation part of the docker container to include echo "/usr/lib/aarch64-linux-gnu/tegra-egl" > /etc/ld.so.conf.d/nvidia-tegra-egl.conf and that fixes the issue of gstreamer blacklisting the deepstream plugins.
However, whenever I do a gst-inspect-1.0 nvtracker to show its information, it would still crashes from a seg fault (similar to what I posted in my earlier post). This however doesn’t seem to cause any issue when running the sample apps.