Stereo camera opencv segmentation fault during exit

I have a simple demo to show the disparity map, using a Jetson Nano 4GB B01:

    import numpy as np
    import cv2

    left_camera = cv2.VideoCapture("nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1024, height=(int)768,format=(string)NV12, framerate=(fraction)20/1 ! nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink")
    right_camera = cv2.VideoCapture("nvarguscamerasrc sensor-id=1 ! video/x-raw(memory:NVMM), width=(int)1024, height=(int)768,format=(string)NV12, framerate=(fraction)20/1 ! nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink")
    stereo = cv2.StereoBM_create(numDisparities=80, blockSize=25)
    while(True):
        ret, left_frame = left_camera.read()
        ret, right_frame = right_camera.read()
        left_frame = cv2.cvtColor(left_frame, cv2.COLOR_BGR2GRAY)
        right_frame = cv2.cvtColor(right_frame, cv2.COLOR_BGR2GRAY)
        disparity = stereo.compute(left_frame, right_frame)
        disparity = cv2.normalize(disparity, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        cv2.imshow('disparity', disparity)
        cv2.imshow('left', left_frame)
        cv2.imshow('right', right_frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    right_camera.release()
    left_camera.release()
    cv2.destroyAllWindows()

It works until I exit (Keyboard q), when it gives a segmentation fault. Here is the end of the log + stack trace:

    GST_ARGUS: Cleaning up
    [Thread 0x7f90ff91f0 (LWP 22846) exited]
    [Thread 0x7f737fe1f0 (LWP 22848) exited]
    
    Thread 14 "argus_thread" received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7f73fff1f0 (LWP 22847)]
    0x0000007faeb34508 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
    (gdb) bt
    #0  0x0000007faeb34508 in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
    #1  0x0000007faebb64a0 in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
    #2  0x0000007fb2e887f0 in  () at /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvarguscamerasrc.so
    #3  0x0000007fb2e883d4 in  () at /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvarguscamerasrc.so
    #4  0x0000007fb2e88570 in  () at /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvarguscamerasrc.so
    #5  0x0000007fb7e17088 in start_thread (arg=0x7f917f97bf) at pthread_create.c:463
    #6  0x0000007fb7f0bffc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

I am using the following Waveshare stereo camera: https://www.waveshare.com/wiki/IMX219-83_Stereo_Camera

I am using JetPack SDK version 4.5.

    cat /etc/nv_tegra_release 
    # R32 (release), REVISION: 5.0, GCID: 25531747, BOARD: t210ref, EABI: aarch64, DATE: Fri Jan 15 22:55:35 UTC 2021

Hi,
For more information, do you observe the issue in launching single camera? Would like to know if it is specific to two-camera case.

Hi DaneLLL,
I created another testcase with a single camera and the segmentation fault continues to happen.
Here is my modified testcase:

    import numpy as np
    import cv2
    
    left_camera = cv2.VideoCapture("nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)640, height=(int)480,format=(string)NV12, framerate=(fraction)20/1 ! nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink")

    stereo = cv2.StereoBM_create(numDisparities=80, blockSize=25)

    while(True):
        ret, left_frame = left_camera.read()
        left_frame = cv2.cvtColor(left_frame, cv2.COLOR_BGR2GRAY)
        disparity = stereo.compute(left_frame, left_frame)
        disparity = cv2.normalize(disparity, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        cv2.imshow('disparity', disparity)
        cv2.imshow('left', left_frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    left_camera.release()
    cv2.destroyAllWindows()

I could only get it to work without a segmentation fault when I commented out the StereoBM and the disparity map calculation code (which is not a possible solution).

Hi,
Please check if you can run this:
OpenCV Video Capture with GStreamer doesn't work on ROS-melodic - #3 by DaneLLL
See if it works in showing video preview.

Hi DaneLLL,
It works showing the video preview with the code in the link you shared.
Please notice that the code I shared also shows the video preview and the disparity map. The segmentation fault happens when exiting the program.

You may monitor argus messages in syslog with:

tail -f /var/log/syslog

and report here.
I’d suspect a reference to frame from stereoBM not being released. Not sure how its destructor is called from python.
Can you add a cv::waitKey(at_least_one_frame_period) before capture release and check if this improves ?

Thank you Honey_Patouceul.
I added cv2.waitKey(1000) before left_camera.release() came and still got a segmentation fault.
Here is the output from syslog:

    Feb 22 12:08:50 gss-nano systemd[1]: Stopped Argus daemon.
    Feb 22 12:08:54 gss-nano systemd[1]: Started Argus daemon.
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: === NVIDIA Libargus Camera Service (0.97.3)=== Listening for connections...=== python3[19104]: Connection established (7F9B7E01D0)OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module0
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module1
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: ---- imager: No override file found. ----
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: LSC: LSC surface is not based on full res!
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: ---- imager: No override file found. ----
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: LSC: LSC surface is not based on full res!
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: === python3[19104]: CameraProvider initialized (0x7f94b72110)LSC: LSC surface is not based on full res!
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0
    Feb 22 12:09:06 gss-nano nvargus-daemon[19058]: message repeated 2 times: [ NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0]
    Feb 22 12:09:07 gss-nano nvargus-daemon[19058]: LSC: LSC surface is not based on full res!
    Feb 22 12:09:07 gss-nano nvargus-daemon[19058]: NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0
    Feb 22 12:09:07 gss-nano nvargus-daemon[19058]: message repeated 2 times: [ NvIspAfConfigParamsSanityCheck: Error: positionWorkingHigh is not larger than positionWorkingLow positionWorkingHigh = 0, positionWorkingLow = 0]
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: === python3[19104]: Connection closed (7F9B7E01D0)=== python3[19104]: WARNING: CameraProvider was not destroyed before client connection terminated.=== python3[19104]:          The client may have abnormally terminated. Destroying CameraProvider...=== python3[19104]: CameraProvider destroyed (0x7f94b72110)=== python3[19104]: WARNING: Cleaning up 2 outstanding requests...=== python3[19104]: WARNING: Cleaning up 2 outstanding stream settings...=== python3[19104]: WARNING: Cleaning up 1 outstanding queues...=== python3[19104]: WARNING: Cleaning up 2 outstanding sessions...SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function returnFrame(), line 372)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function getBuffer(), line 434)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/components/CaptureContainerImpl.cpp, function assignAllBuffersFromStream(), line 230)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/components/stages/CCDataSetupStage.cpp, function doHandleRequest(), line 68)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 992)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function returnFrame(), line 372)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/services/gl/EGLStreamProducer.cpp, function getBuffer(), line 434)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/components/CaptureContainerImpl.cpp, function assignAllBuffersFromStream(), line 230)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/components/stages/CCDataSetupStage.cpp, function doHandleRequest(), line 68)
    Feb 22 12:09:12 gss-nano nvargus-daemon[19058]: SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
    Feb 22 12:09:17 gss-nano nvargus-daemon[19058]: waitForIdleLocked remaining request 200
    Feb 22 12:09:17 gss-nano nvargus-daemon[19058]: waitForIdleLocked remaining request 199
    Feb 22 12:09:17 gss-nano nvargus-daemon[19058]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 922)
    Feb 22 12:09:17 gss-nano nvargus-daemon[19058]: (Argus) Error Timeout:  (propagating from src/api/CaptureSessionImpl.cpp, function destroy(), line 166)
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: waitForIdleLocked remaining request 200
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: waitForIdleLocked remaining request 199
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 922)
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function abortCaptures(), line 888)
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function shutdown(), line 401)
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: PowerServiceCore:handleRequests: timePassed = 5034
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: SCF: Error Timeout:  (propagating from src/api/Session.cpp, function shutdown(), line 501)
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: SCF: Error Timeout:  (propagating from src/api/CameraDriver.cpp, function deleteSession(), line 627)
    Feb 22 12:09:22 gss-nano nvargus-daemon[19058]: (Argus) Error Timeout:  (propagating from src/api/CaptureSessionImpl.cpp, function destroy(), line 191)

Hi,
Do you hit segment fault with this sample:
OpenCV Video Capture with GStreamer doesn't work on ROS-melodic - #3 by DaneLLL
Or only happens when you call StereoBM functions?

Hi DaneLLL,
It only happens when I call StereoBM.

I mentioned before that the seg fault happened when exiting the program. It actually happens during the call to release() method from cv2.VideoCapture.

I’ve just checked from a XavierNX running R32.5 with a opencv-4.4 build.
It works fine in my case, but I need to set gstreamer backend for VideoCapture:

left_camera = cv2.VideoCapture("nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)640, height=(int)480,format=(string)NV12, framerate=(fraction)20/1 ! nvvidconv flip-method=2 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink", cv2.CAP_GSTREAMER)

otherwise it segfaults. However if it works and only crashes on release it should be a different issue.

Which version of opencv is installed in python ?

print(cv2.__version__)

I am using OpenCV version 4.1.1.
I added the cv2.CAP_GSTREAMER argument, but still getting a segmentation fault.

Hi,
Please check if the issue happens with videotestsrc:

videotestsrc is-live=1 ! video/x-raw, width=(int)640, height=(int)480,format=(string)NV12, framerate=(fraction)20/1 ! nvvidconv flip-method=2 ! video/x-raw(memory:NVMM),format=NV12 ! nvvidconv  ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink

If it is confirmed to be an issue in nvarguscamerasrc, please download the source code to add prints for further debugging. We don’t test the OpenCV function and it might be an race condition.

The source code is in
https://developer.nvidia.com/embedded/L4T/r32_Release_v5.0/sources/T210/public_sources.tbz2

I have the same issue on the latest Jetson Nano Jetpack (4.5-b129).
I recompiled libgstnvarguscamerasrc.so with debug symbols (btw “Status” is ambiguous, I had to specify the namespace by “Argus::Status”).
I got the following result:

GST_ARGUS: Cleaning up
[Thread 0x7fa9c0a910 (LWP 10840) exited]
[Thread 0x7fa8c08910 (LWP 10842) exited]

Thread 7 “argus_thread” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa9409910 (LWP 10841)]
0x0000007faf61c508 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
(gdb) backtrace
#0 0x0000007faf61c508 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
#1 0x0000007faf69e4a0 in () at /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
#2 0x0000007faf760d0c in ArgusCamera::StreamConsumer::threadExecute(_GstNvArgusCameraSrc*) (this=0x7faac0c000, src=0x555598ece0) at gstnvarguscamerasrc.cpp:308
#3 0x0000007faf7606b4 in ArgusSamples::ThreadArgus::threadFunction(_GstNvArgusCameraSrc*) (this=0x7faac0c000, src=0x555598ece0) at gstnvarguscamerasrc.cpp:196
#4 0x0000007faf76054c in ArgusSamples::ThreadArgus::threadFunctionStub(void*) (dataPtr=0x7faac0c000) at gstnvarguscamerasrc.cpp:175
#5 0x0000007fb6d78088 in start_thread (arg=0x7faac0bdef) at pthread_create.c:463
#6 0x0000007fb6e81ffc in thread_start () at …/sysdeps/unix/sysv/linux/aarch64/clone.S:78

Further analysis by Valgrind:

CONSUMER: Done Success
GST_ARGUS: Cleaning up
GST_ARGUS: Done Success
GST_ARGUS: Cleaning up
==9855== Thread 14:
==9855== Invalid read of size 8
==9855== at 0xDD39508: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so)
==9855== by 0xDDBB49F: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so)
==9855== by 0xCFF4D0B: ArgusCamera::StreamConsumer::threadExecute(_GstNvArgusCameraSrc*) (gstnvarguscamerasrc.cpp:308)
==9855== by 0xCFF46B3: ArgusSamples::ThreadArgus::threadFunction(_GstNvArgusCameraSrc*) (gstnvarguscamerasrc.cpp:196)
==9855== by 0xCFF454B: ArgusSamples::ThreadArgus::threadFunctionStub(void*) (gstnvarguscamerasrc.cpp:175)
==9855== by 0x556E087: start_thread (pthread_create.c:463)
==9855== by 0x54C8FFB: thread_start (clone.S:78)
==9855== Address 0x58 is not stack’d, malloc’d or (recently) free’d
==9855==
==9855==
==9855== Process terminating with default action of signal 11 (SIGSEGV)
==9855== Access not within mapped region at address 0x58
==9855== at 0xDD39508: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so)
==9855== by 0xDDBB49F: ??? (in /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so)
==9855== by 0xCFF4D0B: ArgusCamera::StreamConsumer::threadExecute(_GstNvArgusCameraSrc*) (gstnvarguscamerasrc.cpp:308)
==9855== by 0xCFF46B3: ArgusSamples::ThreadArgus::threadFunction(_GstNvArgusCameraSrc*) (gstnvarguscamerasrc.cpp:196)
==9855== by 0xCFF454B: ArgusSamples::ThreadArgus::threadFunctionStub(void*) (gstnvarguscamerasrc.cpp:175)
==9855== by 0x556E087: start_thread (pthread_create.c:463)
==9855== by 0x54C8FFB: thread_start (clone.S:78)
==9855== If you believe this happened as a result of a stack
==9855== overflow in your program’s main thread (unlikely but
==9855== possible), you can try to increase the size of the
==9855== main thread stack using the --main-stacksize= flag.
==9855== The main thread stack size used in this run was 8388608.

Finally I got it.

StreamConsumer::threadExecute in gstnvarguscamerasrc.cpp is not processing all the events.

iEventProvider_ptr->waitForEvents() is transferring all the events from the event provider to the local event queue, but then the processing is not checking how many events have been received.

At the next call of waitForEvents() all the remaining events in the local queue are destroyed.
This looks triggering the issue upon a clean exit (again libnvargus_socketclient.so seems to have some bug).

I did a quick & dirty fix (in attachment).

gstnvarguscamerasrc.cpp (71.7 KB)

1 Like

I found 2 issues in gstnvarguscamerasrc.cpp:

  1. src->queue is destroyed in gst_nv_argus_camera_stop while still being used in the StreamConsumer thread waitForEvents(src->queue.get, ...).
  2. gst_buffer_pool_acquire_buffer in consumer_thread for some reason gets blocked during shutdown, so the join of consumer_thread in gst_nv_argus_camera_stop hangs.

I solved both problems by changing the order of calls in gst_nv_argus_camera_stop:

  1. Emit eos_cond signal and join the argus thread before src->queue.reset(), so the StreamConsumer thread finishes before the queue is deleted.
  2. deactivate src->pool before joining consumer_thread.
    Here is a patch file that implements those changes:
    gstnvarguscamerasrc.patch (1.7 KB)

Where does the gstnvarguscamerasrc.cpp file reside? I am unable to locate the file and implement the changes in your patch.