Segfault in libnvscf-owned thread when stopping Argus

  • Hardware Platform (Jetson): Jetson Orin Nano 4GB prod, 8GB prod
  • JetPack Version: JetPack 6.1 (r36.4.0)
  • Issue Type (questions, new requirements, bugs): bug
  • Camera type: Leopard Imaging Hawk stereo camera LI-AR0234CS-STEREO-GMSL2-30 with custom carrier board and MAX9296A deserializer

We are experiencing occasional segmentation faults when stopping Argus.

I managed to catch one segfault with gdb attached and the call stack suggests it happens within a thread owned by libnvscf.so (I don’t have symbols for it).

#0 0x0000ffff9d8a3e40 in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#1 0x0000ffff9d8b9a2c in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#2 0x0000ffff9d8ba48c in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#3 0x0000ffff9d8ce210 in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#4 0x0000ffff9d8ce5dc in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#5 0x0000ffff9d8cad7c in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#6 0x0000ffff9d8c9f54 in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvscf.so
#7 0x0000ffff9d729b78 in ?? ()
  from /usr/lib/aarch64-linux-gnu/nvidia/libnvos.so
#8 0x0000ffffa6450398 [PAC] in start_thread (arg=0x80e960)
  at ./nptl/pthread_create.c:442
#9 0x0000ffffa64b9e9c in thread_start ()
  at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

We stop Argus as follows:

status = iCaptureSession->cancelRequests();
CHECK_STATUS("Failed to cancel requests"); // OK
iCaptureSession->stopRepeat();
status = iCaptureSession->waitForIdle(10'000'000'000ull);
CHECK_STATUS("Failed to wait for idle capture"); // OK

In particular, we notice that waitForIdle call succeeds but the segfault happens shortly after that.

In our application we do not use the nvargus-daemon (we link against nvargus). This GIST shows an example of our camera lifecycle code, if that helps Argus camera usage sample · GitHub .

The issue is more likely to reproduce when the system is under heavier CPU load.

Is this issue known? Is there any known mitigation, patch, or workaround?

Could you confirm by latest release.(r36.4.7)

Also confirm the MMAPI sample code like argus/samples/eglImage/ able to repo the issue or not.

Thanks

I’ll do my best to test but it might be tricky for us to upgrade right now.

Is there a patch I can take quickly to test with r36.1? Either a kernel patch or just by replacing the shared libraries from r36.4.7

@ShaneCCC I cannot see a 36.4.7 release

Also confirm the MMAPI sample code like argus/samples/eglImage/ able to repo the issue or not.

is there a sample able to exercise stereo camera modules (SyncSensorCalibrationData)?

Need update by below command.

sudo apt update
sudo apt dist-upgrade

I will need to public sources (Driver Package (BSP) Sources) for it… for the kernel. where can I download them? they are not in the list

You can download the r36.4.4 source code for r36.4.7

Thank you

I tested the following:

  • r36.4.7 → still repros
  • syncStereo(argus_syncstereo) sample for stereo cameras → also repros (by repeating the command many times to exercise the “stop” path, and stressing the system with stress-ng to simulate load… I suspect this is a concurrency/race condition issue)

Do you have any other suggestion to fix this?

Can repos without stress-ng?

If can’t repos without stress-ng may be assign specific CPU for camera by taskset command to try.