Argus errors when restarting pipeline - Jetson AGX

Hi,

We are working on an accelerated GStreamer application in which we capture from 8 cameras. It is mandatory for us to have all the cameras streaming, so if we lose any stream we restart the respective pipeline. We have a logic in place that detects and triggers the restart, and if the pipeline fails with a CaptureSession error, it is able to restart it without problems. However, if the pipeline fails with a different error (such as a timeout error), it’s not able to restart and we get the following error from the Argus log:

Error generated. gstnvarguscamerasrc.cpp, execute:842 NULL SensorMode interface detected
(Argus) Error InvalidState:  (propagating from src/eglstream/FrameConsumerImpl.cpp, function streamEventThread(), line 135)
(Argus) Error InvalidState:  (propagating from src/eglstream/FrameConsumerImpl.cpp, function streamEventThreadStatic(), line 177)
0:01:46.466919502 23649     0x36e9a1e0 DEBUG       nvarguscamerasrc gstnvarguscamerasrc.cpp:1447:gst_nv_argus_camera_set_caps:<nvarguscamerasrc3> Received caps video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
0:01:46.468691206 23649   0x7f10022ad0 DEBUG       nvarguscamerasrc gstnvarguscamerasrc.cpp:1775:consumer_thread:<nvarguscamerasrc3> consumer_thread: stop_requested=0

Some additional notes:

  1. The setup is based on a Jetson AGX Xavier device, JetPack 4.6.2 (L4T r32.7.2).
  2. We are working with 8 IMX390 GMSL cameras.
  3. The issue happens even with one camera
  4. Restarting the daemon is not an option since it would affect the other cameras. We are looking for the least invasive restart.
  5. The pipeline I’m using is a rather simple pipeline,
    nvarguscamerasrc sensor-id=0 ! videoconvert ! identity name=identity drop-probability=0.0 ! interpipesink name=source_sink sync=false, using the identity element to simulate the buffer drop.
  6. The reason we require the restart is that sometimes the capture fails with a timeout error. We have already tried increasing the timeout, but it doesn’t help. We are working on this separately but wanted the pipeline restart as a recovery mechanism.
Library Opened Successfully
Setting custom lib properties # 1
Adding Prop: mode : bayer
Inside Custom Lib : Setting Prop Key=mode Value=bayer
0:00:53.178325415    32     0x133c5000 DEBUG       nvarguscamerasrc gstnvarguscamerasrc.cpp:1447:gst_nv_argus_camera_set_caps:<nvarguscamerasrc7> Received caps video/x-raw(memory:NVMM), width=(int)720, height=(int)1280, framerate=(fraction)30/1, format=(string)NV12
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 31.622776; Exposure Range min 118000, max 33333000;

GST_ARGUS: Running with following settings:
   Camera index = 7 
   Camera mode  = 0 
   Output Stream W = 1920 H = 1080 
   seconds to Run    = 0 
   Frame Rate = 29.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
max_fps_dur 8.33333e+06 min_fps_dur 2e+08
Opening in BLOCKING MODE 
max_fps_dur 8.33333e+06 min_fps_dur 2e+08
NvMMLiteOpen : Block : BlockType = 8 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
NVMEDIA: H265 : Profile : 1 
NVMEDIA_ENC: bBlitMode is set to TRUE 
2024-05-30 14:27:51,219 - CRITICAL - MainThread - Error on the bus NvArgusCameraSrc: TIMEOUT (6)
2024-05-30 14:27:51,220 - CRITICAL - MainThread - Error on the bus NvArgusCameraSrc: TIMEOUT (6)

Thanks in advance for your help.

hello carolina.trejos,

had you also verify the basic camera functionality with v4l2 IOCTL?

Hi @JerryChang,

Thanks for your response. In an loop of ~50 iterations of start/stop with v4l2-ctl, I saw no issues. With argus, I can see this timeout issue in maybe 1/5 iterations.

hello carolina.trejos,

is it a must to stay-on JP-4.6.2/r32.7.2?
there’re several bug fixes has included in r32.7.4, and we’ve also found stability improvement based-on r32.7.5
so… it’ll be great if you’re able moving to r32.7.5

hello carolina.trejos,

here’s JP-4.6.5/r32.7.5 libnvargus.so update which may address your stability issue.
for instance, Topic320793_Jan23.zip (366.8 KB)

Hi @JerryChang,

We are evaluating an upgrade, but before jumping into it we wanted to see if we had other options. I tested the binary you provided but I’m getting this error.

$ gst-launch-1.0 nvarguscamerasrc sensor-id=1 ! perf ! fakesink -v 
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Error generated. gstnvarguscamerasrc.cpp, execute:740 No cameras available
/GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstPerf:perf0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstFakeSink:fakesink0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstPerf:perf0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
Got EOS from element "pipeline0".
Execution ended after 0:00:00.009346304
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

The argus log doesn’t contain much either.

=== NVIDIA Libargus Camera Service (0.98.3)=== Listening for connections...=== gst-launch-1.0[13372]: Connection established (7F8EFE01D0)OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module0
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module1
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module2
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module3
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module4
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module5
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module6
OFParserListModules: module list: /proc/device-tree/tegra-camera-platform/modules/module7
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: WARNING: Could not map module to ISP config string
NvPclHwGetModuleList: No module data found
OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_centerleft_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_bottomleft_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_topright_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_bottomright_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_topleft_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_centerright_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_rear_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
---- imager: Found override file [/var/nvidia/nvcam/settings/imx390_front_liimx390.isp]. ----
CAM: serial no file already exists, skips storing againLSC: LSC surface is not based on full res!
=== gst-launch-1.0[13372]: CameraProvider initialized (0x7f88911d80)=== gst-launch-1.0[13372]: CameraProvider destroyed (0x7f88911d80)=== gst-launch-1.0[13372]: Connection closed (7F8EFE01D0)=== gst-launch-1.0[13372]: Connection cleaned up (7F8EFE01D0)

Some more details:

  1. The status of the service seems ok
  2. I can capture with v4l2
  3. I did restart the daemon and the board, just in case

Hi @JerryChang,

We are gathering more info on the issue. Any other suggestions?

Thanks!

hello carolina.trejos,

did you apply that libnvargus.so to r32.7.2 directly?

since I’ve based-on r32.7.5 to cherry-pick couple of fixes to re-build libnvargus.
you must moving to JP-4.6.5/r32.7.5 to apply that pre-built update, or, it might have some dependency issues.

Hi @JerryChang,

We updated and tested JP-5.1.2, and the error seems different but the behavior is ultimately the same. We get the following error.

Setting pipeline to PAUSED …
Pipeline is live and does not need PREROLL …
0:00:00.791217314 45606 0xaaaadea59b00 DEBUG nvarguscamerasrc gstnvarguscamerasrc.cpp:1449:gst_nv_argus_camera_set_caps: Received caps video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
Setting pipeline to PLAYING …
New clock: GstSystemClock
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, threadExecute:694 NvBufSurfaceFromFd Failed.
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, threadFunction:247 (propagating)
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: DISCONNECTED
Additional debug info:
Argus Error Status
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected…
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 1920 x 1080 FR = 29,999999 fps Duration = 33333334 ; Analog Gain range min 1,000000, max 31,622776; Exposure Range min 118000, max 33333000;

hello carolina.trejos,

honestly, there’re several camera bugs within JP-5.1.2/r35.4.1.
for instance,

  1. Topic 268833, JP-5.1.2 camera firmware to update deskew algorithm, and also stability fixes.
  2. Topic 268519, memory corruption within libnvargus for multiple camera use-case.
  3. Topic 305949, Argus long run stability issue fixes.