Segmentation faults and memory corruption with nvv4l2decoder

icetana · October 18, 2021, 7:39am

Hardware Platform (Jetson / GPU) - Multiple dGPU
DeepStream Version - v5.0.1
JetPack Version (valid for Jetson only) - N/A
TensorRT Version - N/A
Issue Type (questions, new requirements, bugs) - Bug

We’re not quite sure how to report this issue as we haven’t been able to pinpoint the issue directly, so apologies in advance. For some time, we’ve been battling with issues that we believe to be caused by the nvv4l2decoder, manifested as memory corruption and segmentation faults.

Example #1

(dstreamer:1): GStreamer-CRITICAL **: 01:09:10.493: gst_buffer_get_sizes_range: assertion 'GST_IS_BUFFER (buffer)' failed

(gdb) bt #0 0x00007ffff71477b9 in gst_buffer_copy_into () from /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0 #1 0x00007fffcc9e7c02 in gst_v4l2_video_dec_loop () from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so #2 0x00007ffff71b4269 in ?? () from /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0 #3 0x00007ffff770cb40 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #4 0x00007ffff770c175 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x00007ffff7bbd6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007ffff5b0ea3f in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) info symbol 0x00007ffff71477b9 gst_buffer_copy_into + 1417 in section .text of /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0

This is a good example of memory corruption, at line gstreamer/gstbuffer.c at 1.14.5 · GStreamer/gstreamer · GitHub within the source for gstbuffer , the dest and src buffers are checked to see if they are NULL.

Later on, the above assertion occurs at line gstreamer/gstbuffer.c at 1.14.5 · GStreamer/gstreamer · GitHub.

The odd part about this is that the src buffer passes the NULL check, but later is set to NULL ; but this is not done within the function itself. This can only happen if memory corruption is occurring. Specifically, if it is occurring within a separate thread.

Example #2

==1== Thread 18: ==1== Jump to the invalid address stated on the next line ==1== at 0x0: ??? ==1== by 0x98238BF: nvds_clear_meta_list (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x98222E6: release_frame_meta (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x9822F9F: nvds_destroy_meta_pool (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x9822A6C: nvds_destroy_frame_meta_pool (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x98206E5: nvds_destroy_batch_meta (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x9820DB3: nvds_batch_meta_release_func (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x6CD9AC4: gst_nvds_meta_free (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvdsgst_meta.so) ==1== by 0x580F5EE: gst_buffer_foreach_meta (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== by 0x581261D: gst_buffer_pool_release_buffer (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== by 0x580B9F2: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== by 0x5840A1B: gst_mini_object_unref (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==1== ==1== ==1== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==1== Bad permissions for mapped region at address 0x0 ==1== at 0x0: ??? ==1== by 0x98238BF: nvds_clear_meta_list (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x98222E6: release_frame_meta (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x9822F9F: nvds_destroy_meta_pool (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x9822A6C: nvds_destroy_frame_meta_pool (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x98206E5: nvds_destroy_batch_meta (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x9820DB3: nvds_batch_meta_release_func (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_meta.so) ==1== by 0x6CD9AC4: gst_nvds_meta_free (in /opt/nvidia/deepstream/deepstream-5.0/lib/libnvdsgst_meta.so) ==1== by 0x580F5EE: gst_buffer_foreach_meta (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== by 0x581261D: gst_buffer_pool_release_buffer (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== by 0x580B9F2: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0) ==1== by 0x5840A1B: gst_mini_object_unref (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1405.0)

As per above, a segmentation fault is being caused upon attempting to free frame-specific NVIDIA DeepStream metadata. The problem is that the free function pointer on the metadata appears to have been set to NULL.

In the past, this has been traced to metadata of our own creation. However, we have debugged this and we do attach a free function to the DeepStream metadata; otherwise the pipeline would have been failing soon after starting. So the odd thing is that this is happening randomly and part-way through processing a stream with the pipeline.

The main plausible explanation for this segmentation fault pertains to memory corruption?

Example #3

Segmentation fault (read-fault for memory address outside of heap memory) occurring within the unsigned char* std::copy<__gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*) method.

As per callgrind, the call to unsigned char* std::copy<__gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*) is 13-calls deep from the FramesRecord::FramesRecord(FramesRecord const&) call. Which means that our segmentation fault is occurring after 12 template-generated standard library calls. It is extremely unlikely that a segmentation fault would have occurred that many standard-library calls deep from data passed to it within our code. That heavily suggests that the issue in our case is being caused by memory corruption?

Example #4

(dstreamer:1): GStreamer-[1;35mCRITICAL[0m **: [34m08:07:57.754[0m: gst_buffer_add_reference_timestamp_meta: assertion 'GST_IS_CAPS (reference)' failed 50:11:25.239312381 [332m 1[00m 0x7fa504000ed0 [31;01mERROR [00m [00;01;42m GST_BUFFER gstbuffer.c:642:gst_buffer_copy_into:[00m failed to copy meta 0x7fa4189f5388 of API type GstReferenceTimestampMetaAPI

As per above, we are incurring an issue with gst_buffer_add_reference_timestamp_meta . However, it is not occurring within our pad-probe callback, but within gst_buffer_copy_into ; an unexpected source of this error.

This function call has many sources, but among those include:

gst_mini_object_copy
gst_v4l2_video_dec_loop
gst_v4l2_buffer_pool_copy_buffer
Unknown function from libgstvideoparsersbad .

The fact that gst_v4l2_video_dec_loop and gst_v4l2_buffer_pool_copy_buffer show up within this call-tree as per callgrind’s findings, suggests that there could be a link between these method calls and the above error occurring.

In addition, since this is occurring in a similar place to Example #1, it is likely that these two issues are linked.

We sadly can’t narrow things down any further, but between Examples 1 & 4, we believe that gst_v4l2_video_dec_loop seems to be a good place to start??

Fiona.Chen · October 18, 2021, 8:19am

Is there any method to reproduce these errors?

Please upgrade to latest Deepstream 5.1.

system · November 9, 2021, 1:15am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.