SIGSEGV error with Argus JP 5.1.5

Hi,

We are working on an GStreamer application in which we capture from 8 IMX390 cameras on a Jetson AGX Xavier. We are trying to migrate to JP 5.1.5 but we are running into the following issue:

Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
0:00:01.518066239 ^[[332m87571^[[00m 0xaaaaf5ef6b00 ^[[37mDEBUG  ^[[00m ^[[00m    nvarguscamerasrc gstnvarguscamerasrc.cpp:1504:gst_nv_argus_camera_set_caps:<nvarguscamerasrc0>^[[00m Received caps video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 31.622776; Exposure Range min 118000, max 33333000;

GST_ARGUS: Running with following settings:
   Camera index = 4
   Camera mode  = 0
   Output Stream W = 1920 H = 1080
   seconds to Run    = 0
   Frame Rate = 29.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.

(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 1 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 1 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
0:00:29.961106836 ^[[333m21272^[[00m 0xaaaae6cf8580 ^[[37mDEBUG  ^[[00m ^[[00m    nvarguscamerasrc gstnvarguscamerasrc.cpp:1769:consumer_thread:<nvarguscamerasrc0>^[[00m consumer_thread: stop_requested=1

CONSUMER: Done Success
Got EOS from element "pipeline0".
Execution ended after 0:00:29.778021455
Setting pipeline to NULL ...
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
Caught SIGSEGV

This issue happens randomly when starting one of the cameras, but can happen with any camera. The pipeline we are using is simply gst-launch-1.0 nvarguscamerasrc sensor-id=$i ! fakesink

We have tested the camera stability with v4l2-ctl and we have no issues with capture there. Have you guys seen this issue at all? Is this a known issue or are there any ways we can avoid it?

Any help will be much appreciated.

Check below patch.

[ARGUS stability]
https://forums.developer.nvidia.com/t/nvarguscamerasrc-timeout-jetpack-5-1-4/316367/3

Hi @ShaneCCC, thanks for the patches. I tested them on JP 5.1.5 but they don’t seem to fix our issue. In fact, I now see another error:

Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
0:00:01.527007615  4647 0xaaaac0f4fb60 DEBUG       nvarguscamerasrc gstnvarguscamerasrc.cpp:1504:gst_nv_argus_camera_set_caps:<nvarguscamerasrc0> Received caps video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, threadExecute:723 NvBufSurfaceFromFd Failed.
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, threadFunction:242 (propagating)
0:00:05.534105401  4647 0xaaaac12dfd80 DEBUG       nvarguscamerasrc gstnvarguscamerasrc.cpp:1769:consumer_thread:<nvarguscamerasrc0> consumer_thread: stop_requested=1

GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 31.622776; Exposure Range min 118000, max 33333000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 0 
   Output Stream W = 1920 H = 1080 
   seconds to Run    = 0 
   Frame Rate = 29.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
nvbuf_utils: dmabuf_fd -1 mapped entry NOT found
Got EOS from element "pipeline0".
Execution ended after 0:00:04.007294918
Setting pipeline to NULL ...
0:00:11.296506201  4647 0xaaaac12dfde0 DEBUG       nvarguscamerasrc gstnvarguscamerasrc.cpp:1675:argus_thread:<nvarguscamerasrc0> argus_thread: stop_requested=1

GST_ARGUS: Cleaning up
GST_ARGUS: Done Success
Freeing pipeline ...

Do you have any other patches I can test? They can be for either JP 5.1.2 or JP 5.1.5, we have the same issue on both.

Any script to repo this issue?

We use the following script:

#!/bin/bash

start_cameras() {
	# Loop for 8 cameras (sensor-id from 0 to 7)
	for i in {0..7}; do
	    echo "Starting pipeline for sensor-id=$i, logging to log$i.txt"
	    # Run the pipeline in the background and redirect output to log files
	    echo "============================" >> log$i.txt
	    date >> log$i.txt
	    GST_DEBUG=2,*argus*:6 gst-launch-1.0 nvarguscamerasrc sensor-id=$i ! perf ! fakesink >> log$i.txt 2>&1 &
	    sleep 5
	done
	echo "All pipelines are running. Logs are being saved to log<i>.txt files."
}

for i in {0..100};do
    start_cameras
    sleep 30
    pkill -2 gst-launch-1.0
    sleep 5
    systemctl restart nvargus-daemon
    sleep 10
done

I will try on 6 cameras that what I only have.

Could you also verify by 6 cameras.

Thanks

I’m still able to reproduce this with 6 cameras. The failure rate is overall lower but it still happens.

Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
0:00:00.172837914 ^[[335m73168^[[00m 0xaaaae1bb5b00 ^[[37mDEBUG  ^[[00m ^[[00m    nvarguscamerasrc gstnvarguscamerasrc.cpp:1449:gst_nv_argus_camera_set_caps:<nvarguscamerasrc0>^[[00m Received caps video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
Setting pipeline to PLAYING ...
New clock: GstSystemClock
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 31.622776; Exposure Range min 118000, max 33333000;

GST_ARGUS: Running with following settings:
   Camera index = 1
   Camera mode  = 0
   Output Stream W = 1920 H = 1080
   seconds to Run    = 0
   Frame Rate = 29.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: DISCONNECTED
Additional debug info:
Argus Error Status
CONSUMER: ERROR OCCURRED
EOS on shutdown enabled -- waiting for EOS after Error
Waiting for EOS...
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
0:00:12.374410027 ^[[335m73168^[[00m 0xaaaae1f44580 ^[[37mDEBUG  ^[[00m ^[[00m    nvarguscamerasrc gstnvarguscamerasrc.cpp:1714:consumer_thread:<nvarguscamerasrc0>^[[00m consumer_thread: stop_requested=1

Got EOS from element "pipeline0".
EOS received - stopping pipeline...
Execution ended after 0:00:12.200197577
Setting pipeline to NULL ...
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
Caught SIGSEGV

Could you modify the sensor driver to set fixed exposure/framerate like make the CID function as dummy function to verify.

Thanks

Hi @ShaneCCC ,

I modified the driver as instructed, but we are still getting some errors. Do you have any other suggestions?

Thanks

Get the daemon log.

journalctl -u nvargus-daemon -f

Hi @ShaneCCC ,

I’ve attached the log from an iteration that failed with SIGSEGV.

segfault.log (39.6 KB)

The SIGSEGV could be cause by many time capture error and still try to reconnected.

Does the capture timeout error every times?

No, it doesn’t fail every time. If just fails on some iterations, and on JP 5 we don’t see timeouts, just SIGSEGV or DISCONNECTED failures.

Apply below patch to try.

libnvscf.so.r35.6.close (8.4 MB)

Hi @ShaneCCC ,

I still see some SIGSEGV errors, 5% of the times it fails with the error. Any other suggestions? I’ve attached the daemon log as well of one iteration that failed.

single_failure.log (17.2 KB)

How many cameras for the test.

Could you verify by argus_camera.

We are testing with 8 cameras. We have tested argus_camera and haven’t seen any issues with it.

Please help to verify by 6 cameras.

Thanks

It can still happen with 6 cameras, although less frequently

Could you verify by single camera.

Thanks