Sometimes libnvsipl.so crash in running on driveos 6.0.8.1

Hi NVidia,

We found that nvsipl might crash (segmentation fault) after running for a certain time. The detail trace is as below:

#8 object “/usr/lib/aarch64-linux-gnu/ld-2.31.so”, at 0xffffffffffffffff, in
#7 Source “…/sysdeps/unix/sysv/linux/aarch64/clone.s”, line 78, in thread start [0xffff7f40062b]
#6 Source “/build/glibc-kcnsjy/glibc-2.31/nptl/pthread create.c”, line 477, in start thread [0xffff881aa623]
#5 Object “/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.28”,at 0xffff7f5e0fab, in
#4 0bject “/usr/lib/libnvsipl.so”, at 0xffff847d09ff, in
#3 object “/usr/lib/libnvsipl.so”,at 0xffff847e168b, in
#2 object “/usr/lib/libnvsipl.so”, at 0xffff847e11ab, in
#1 0bject “/usr/lib/libnvsipl.so”,at 0xffff847e078b, in
#0 object “/usr/lib/libnvsipl.so”,at 0xffff847e050c, in
Segmentation fault (Address not mapped to object 「(nil))

Could you help to have a check ?

==========================================

Please provide the following info (tick the boxes after creating this topic):
Software Version
[*] DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[*] Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
[*] DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
[*] native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Dear @chenzhaohong,
Could you please check the issue persists on DRIVE OS 6.0.10?

Hi Siva,

Thanks for your quick reply.

This issue was reproduced by our custom application for auto-driving.

As you known, 6.10 is a new release recently. We have to spend huge efforts to upgrade our software stack if changing from 6.0.8.1 to 6.10. Originally we have no plan for this upgrade in the future 2 months due to other higher priority tasks in our team.

Was there similar issue already known by NV and fixed in 6.10 ?

Dear @chenzhaohong,
Could you provide repro steps for quick test on our side?

  1. Running auto-parking application under CGF application framework pipeline, with kinds of algorithm and data recording
  2. Use Four fish-eye YUV camera and one 120 degree front YUV camera.
  3. For some reason, Camera SDK use drive OS nvsipl, instead of dwCamera.

Dear @chenzhaohong,

DW APIs uses SIPL framework to capture camera data.
Do you see the same issue when recording using nvsipl_camera or driveworks recorder tool?

Hi Siva,

Previously we just used nvsipl_camera for basic driver verification, and did not see any similar issue before. DwCamera was not being used in our software stack now.

This issue occurred with probability after running dozens of minutes.
As we were able to catch the trace, could you try to parse it locally or provide the debugging libraries, to find which function was crashed ? If any info is missing, please let me know and supply later.

Dear @chenzhaohong,
The trace call is not good enough to identify issue.
As the issue does not seem to occur with nvsipl_camera and notice it only when using CGF app, if you can provide simple CGF app to repro, I can check with engineering team.

Dear Siva,
It was not able to reproduce this issue with nvsipl_camera or simple CGF APP. I will also dump the process map if the issue reproduced to help locate root cause with trace.