Nvargus-daemon crash after 20-45 min with 8x ZED X GMSL2 cameras on AGX Orin — L4T 36.4.4

Hi,

We’re experiencing nvargus-daemon crashes after 20-45 minutes of continuous streaming with 8 GMSL2 cameras on Jetson AGX Orin. We’ve done extensive testing and narrowed this down to
known nvargus/kernel issues.

We’re currently on L4T 36.4.4 (JetPack 6.2.1) and plan to OTA upgrade to L4T 36.5.0 (JetPack 6.2.2), since @JerryChang confirmed in topic 325399 comment
#9
that the host1x-fence memory leak fix is included in r36.5.0. However, the
r36.5.0 release notes don’t explicitly mention camera stability fixes, so we’d
like to confirm whether all the relevant patches are included, or if additional library replacements are still needed after upgrading.

Hardware

  • Jetson AGX Orin 64GB Developer Kit
  • 4x Stereolabs ZED X stereo cameras (8 GMSL2 streams total) via GMSL2 capture board
  • All cameras streaming 1080p30 simultaneously

Software

  • L4T 36.4.4 (JetPack 6.2.1), installed June 2025
  • ZED SDK 5.x
  • enableCamInfiniteTimeout=1 already set in nvargus-daemon service

Current library versions (L4T 36.4.4)

md5: 142547f5 /usr/lib/aarch64-linux-gnu/nvidia/libnvargus.so (1.0 MB)
md5: 8212ca9b /usr/lib/aarch64-linux-gnu/nvidia/libnvargus_socketserver.so (1.8 MB)
md5: ca971fd1 /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so (7.8 MB)

Symptoms

Two distinct failure modes, both after 20-45 minutes of continuous 8-camera streaming:

Mode 1 — FUSA VI handler SEGV:
nvargus-daemon crashes with SIGSEGV (signal 11). Journal shows:

SCF: Error InvalidState: (propagating from /dvs/git/…/fusaViHandler.cpp, function handleCorrErrors(), line 817)
SCF: Error InvalidState: Corr Error 8 for sensor ID X

Followed by daemon crash and systemd restart. All camera sessions are invalidated after restart.

Mode 2 — Camera FAILURE without daemon crash:
ZED SDK reports FAILURE in sl::Camera::grab() and CAMERA REBOOTING for all cameras simultaneously. Frame counts stop increasing. nvargus-daemon continues running (no SEGV in
journal).

In both cases, the only recovery is restarting nvargus-daemon + reopening all ZED camera sessions.

Test results

We ran extensive stress tests:

Test Result
10s recordings x30 30/30 PASS
15-min recordings x8 (fresh nvargus restart before each batch) 8/8 PASS
15-min recordings x15 (no nvargus restart between batches) 3/15 PASS — fails after ~30-45 min cumulative
30-min continuous recording x3 0/3 PASS — fails at 8-22 min

Short recordings with stop/start cycles are 100% reliable. The failure correlates with cumulative continuous streaming time, not recording count.

What we’ve checked on L4T 36.4.4

  1. enableCamInfiniteTimeout=1 — already enabled
  2. host1x-fence.ko — our module does not contain dma_fence_put symbol, confirming the memory leak fix from topic
    325399
    is missing
  3. capture-ivc.ko — our module does not contain semaphore symbols (sema_init, down_timeout), suggesting the multi-camera race condition fix referenced in topic
    355130
    is missing
  4. libnvscf.so — our version is 7.8 MB, while the patched version from topic
    355130
    is 8.5 MB

Questions

We plan to OTA upgrade to JetPack 6.2.2 (L4T r36.5.0). Before we do, can you confirm whether r36.5.0 includes the following fixes:

  1. host1x-fence memory leakdma_fence_put fix (topic 325399). We
    believe yes per comment #9, but the r36.5.0 release notes don’t list it.
  2. capture-ivc semaphore — multi-camera race condition fix referenced in topic
    355130
  3. Updated libnvscf.so — infinite timeout / long-run stability fix from topic
    355130
  4. Updated libnvargus.so / libnvargus_socketserver.so — stability patches previously distributed via forum threads (topic
    324126
    , topic
    320518
    )

If any of the above are not included in r36.5.0, could you provide patched binaries for that release?

Thank you!

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, I will attempt to provide a comprehensive answer to the question.

The issue described is related to nvargusdaemon crashes after 2045 minutes of continuous streaming with 8 GMSL2 cameras on Jetson AGX Orin. The crashes are attributed to known nvarguskernel issues.

To address this issue, the user is requesting the latest patched libraries and kernel modules for L4T 36.44. Specifically, they are looking for:

  1. Patched infinite timeout (long-run stability)
  2. Patched kernel module with the leak fix
  3. Patched kernel module with the semaphore race condition fix

Or, if there is a newer BSP (point release) that includes all of these fixes, they would like to know about it.

Comparing the provided information with the context, we can see that:

  • The user is running L4T 36.44 with JetPack 6.2.1 installed in June 2025.
  • The current library versions are:
    • libnvargus.so (10 MB, md5 142547f5)
    • libnvargus_socket_server.so (18 MB, md5 8212ca9b)
    • libnvscf.so (78 MB, md5 ca971fd1)
  • The user has already checked that their module contains the symbol for the memory leak fix from topic 325399, but it is not included.
  • The user has also checked that their module contains semaphore symbols, suggesting the multicamera race condition fix is not included.
  • The user’s version of libnvscf.so is 78 MB, while the patched version from topic 355130 is 85 MB.

Based on this information, it appears that the user is experiencing issues with nvargusdaemon crashes due to known nvarguskernel issues. The user is requesting the latest patched libraries and kernel modules to address these issues.

To provide a comprehensive answer, I would recommend the following:

  • Check the NVIDIA website for the latest L4T releases and patches for Jetson AGX Orin.
  • Verify that the latest patches include the fixes for the memory leak and semaphore race condition issues.
  • If the latest patches are available, provide the user with the necessary instructions to update their system.
  • If there is a newer BSP (point release) that includes all of these fixes, provide the user with information on how to upgrade to the latest version.

In terms of comparing the provided information, it is clear that the user is experiencing issues with nvargusdaemon crashes due to known nvarguskernel issues. The user has already checked that their module contains the symbol for the memory leak fix, but it is not included. The user’s version of libnvscf.so is 78 MB, while the patched version from topic 355130 is 85 MB. This suggests that the user’s version may not include the latest patches.

Overall, the user is requesting the latest patched libraries and kernel modules to address the issues with nvargusdaemon crashes. Providing the user with the necessary instructions to update their system to the latest version should resolve the issues.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

@JerryChang it would be nice if you can confirm the memory leak is fixed in JetPack 6.2.1 as you mentioned in Kernel memory leak per-frame with Argus cameras - r36.4 - #9 by JerryChang
thanks

hello willzeng,

– host1x-fence memory leak
that host1x memory leak fix has included in JP-6.2.2/r36.5 instead of JP-6.2.1/r36.4.4

– capture-ivc semaphore
yes, JP-6.2.2/r36.5 has this change to add semaphore to avoid multi-cam race condition as well.

– Updated libnvscf.so
JP-6.2.2/r36.5 has the fixes to have enableCamInfiniteTimeout property works normally.

– Updated libnvargus.so / libnvargus_socketserver.so
JP-6.2.2/r36.5 has stability included as well.