Hi team!
We’re using DeepStream 5 GA in production environment and everything was working fine until I used OpenCV. My application is quite complex, so I’ve reproduced the bug using deepstream-test4
app which uses kafka.
I’ve spent the weekend debugging using gdb
and valgrind
, and both tell me the same story:
GDB:
(gdb) bt
#0 0x00007ffff5f51f47 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff5f538b1 in __GI_abort () at abort.c:79
#2 0x00007ffff5f9c907 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff60c9dfa "%s\n")
at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff5fa397a in malloc_printerr (str=str@entry=0x7ffff60c7fe8 "free(): invalid pointer")
at malloc.c:5350
#4 0x00007ffff5faae8c in _int_free (have_lock=0, p=0x7fff40004e92, av=0x7ffff62fec40 <main_arena>)
at malloc.c:4157
#5 0x00007ffff5faae8c in __GI___libc_free (mem=0x7fff40004ea2) at malloc.c:3124
#6 0x00007fff74d4ef91 in json_delete () at /usr/lib/x86_64-linux-gnu/libjansson.so.4
#7 0x00007fff7524a649 in json_decref(json_t*) (json=0x7fff40005000)
at /dvs/git/dirty/git-master_linux/deepstream/sdk/../../3rdparty/jansson/2.7/src/jansson.h:112
#8 0x00007fff7524a9ef in json_get_key_value(char const*, int, char const*, char*, int) (msg=0x7fff4000b430 "{\n \"messageid\" : \"7203228c-63ba-42cc-9607-e0e0acec82f6\",\n \"mdsversion\" : \"1.0\",\n \"@timestamp\" : \"2020-09-13T11:18:06.008Z\",\n \"place\" : {\n \"id\" : \"1\",\n \"name\" : \"XYZ\",\n \"type\" : \"garage\",\n"..., msglen=1774, path=0x555556328ed0 "sensor.id", value=0x7fff4bffde50 "0\233\251\367\377\177", nbuf=100) at json_helper.cpp:92
#9 0x00007fff75248d0f in nvds_msgapi_send_async(NvDsMsgApiHandle, char*, uint8_t const*, size_t, nvds_msgapi_send_cb_t, void*) (h_ptr=0x555556328ad0, topic=0x555556322d00 "anpr", payload=0x7fff4000b430 "{\n \"messageid\" : \"7203228c-63ba-42cc-9607-e0e0acec82f6\",\n \"mdsversion\" : \"1.0\",\n \"@timestamp\" : \"2020-09-13T11:18:06.008Z\",\n \"place\" : {\n \"id\" : \"1\",\n \"name\" : \"XYZ\",\n \"type\" : \"garage\",\n"..., nbuf=1774, send_callback=0x7fff758a0e5e <nvds_msgapi_send_callback>, user_ptr=0x5555563feb60)
at nvds_kafka_proto.cpp:593
#10 0x00007fff758a2897 in legacy_gst_nvmsgbroker_render ()
at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_msgbroker.so
#11 0x00007fff758a2632 in gst_nvmsgbroker_render ()
at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_msgbroker.so
#12 0x00007fffd5680cee in () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#13 0x00007fffd5681a60 in () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#14 0x00007ffff7b1088b in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#15 0x00007ffff7b18bb3 in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#16 0x00007fffd568bcaf in () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#17 0x00007ffff7b1088b in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#18 0x00007ffff7b18bb3 in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#19 0x00007fffd58efba9 in () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstcoreelements.so
#20 0x00007ffff7b45269 in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#21 0x00007ffff75a3b40 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x00007ffff75a3175 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007ffff55556db in start_thread (arg=0x7fff4bfff700) at pthread_create.c:463
#24 0x00007ffff6034a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Valgrind:
Frame Number = 0 Vehicle Count = 4 Person Count = 2
Frame Number = 1 Vehicle Count = 4 Person Count = 2
==22073== Thread 13 nvtee-que1:src:
==22073== Invalid free() / delete / delete[] / realloc()
==22073== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22073== by 0x517E9F90: json_delete (in /usr/lib/x86_64-linux-gnu/libjansson.so.4.11.0)
==22073== by 0x512F1648: json_decref (jansson.h:112)
==22073== by 0x512F19EE: json_get_key_value(char const*, int, char const*, char*, int) (json_helper.cpp:92)
==22073== by 0x512EFD0E: nvds_msgapi_send_async (nvds_kafka_proto.cpp:593)
==22073== by 0x50C92896: legacy_gst_nvmsgbroker_render (in /opt/nvidia/deepstream/deepstream-5.0/lib/gst-plugins/libnvdsgst_msgbroker.so)
==22073== by 0x50C92631: gst_nvmsgbroker_render (in /opt/nvidia/deepstream/deepstream-5.0/lib/gst-plugins/libnvdsgst_msgbroker.so)
==22073== by 0x28B88CED: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0)
==22073== by 0x28B89A5F: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0)
==22073== by 0x4EB288A: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0)
==22073== by 0x4EBABB2: gst_pad_push (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0)
==22073== by 0x28B93CAE: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0)
==22073== Address 0xf4cfa0c2 is 2 bytes inside a block of size 256 alloc'd
==22073== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22073== by 0x517E4530: ??? (in /usr/lib/x86_64-linux-gnu/libjansson.so.4.11.0)
==22073== by 0x517EA218: json_object_set_new_nocheck (in /usr/lib/x86_64-linux-gnu/libjansson.so.4.11.0)
==22073== by 0x517E5CBD: ??? (in /usr/lib/x86_64-linux-gnu/libjansson.so.4.11.0)
==22073== by 0x517E5F55: ??? (in /usr/lib/x86_64-linux-gnu/libjansson.so.4.11.0)
==22073== by 0x517E61F8: json_loadb (in /usr/lib/x86_64-linux-gnu/libjansson.so.4.11.0)
==22073== by 0x512F16AE: json_get_key_value(char const*, int, char const*, char*, int) (json_helper.cpp:34)
==22073== by 0x512EFD0E: nvds_msgapi_send_async (nvds_kafka_proto.cpp:593)
==22073== by 0x50C92896: legacy_gst_nvmsgbroker_render (in /opt/nvidia/deepstream/deepstream-5.0/lib/gst-plugins/libnvdsgst_msgbroker.so)
==22073== by 0x50C92631: gst_nvmsgbroker_render (in /opt/nvidia/deepstream/deepstream-5.0/lib/gst-plugins/libnvdsgst_msgbroker.so)
==22073== by 0x28B88CED: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0)
==22073== by 0x28B89A5F: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0)
It seems there’s a bug in the nvds_msgapi_send_async
which is trying to free the payload. The funny thing is all you have to do is import an OpenCV function and so some operation on it. You don’t even have to call the function, mere presence of OpenCV in your code will make the Kafka broker go crazy.
In order to further corroborate my claim, I wrote a custom gstreamer plugin using librdkafka
and replaced the nvmsgbroker
with my plugin and everything works absolutely fine.
I’m attaching my code for your perusal. Looking for guidance. Thanks
• Hardware Platform (GPU): Tesla T4
• DeepStream Version: 5.0 GA
• TensorRT Version: 7
• NVIDIA GPU Driver Version: 440.33.01
kafka_bug.zip (62.7 KB)