Occasional camera stream interruptions

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.0

Target Operating System
Linux

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)

SDK Manager Version
1.7.0.8846

Host Machine Version
native Ubuntu 18.04

Occasionally, especially when the system load is high, we do not receive any new images from the attached IMX390 GMSL cameras anymore for some time.

Using NSight, I have found that when this is happening, the SIPL_ICP_ISP_0 thread gets stuck in a call to ioctl inside this function:

NvMediaICPGetImageGroup
Begins: 31,4961s
Ends: 31,8883s (+392,239 ms)

Call stack:
libnvsipl.so ! 0x7f987dbe10
libpthread-2.27.so ! start_thread

This function does two ioctl calls. The first one returns just fine like it always does:

ioctl
Begins: 31,4961s
Ends: 31,5201s (+24,006 ms)

Call stack at 31,4961s :
libc-2.27.so ! ioctl
libnvrm_host1x.so ! NvRmHost1xSyncpointWait
libnvmedia.so [2 Frames]
Nsight Systems frames
libnvsipl.so [4 Frames]
libstdc++.so.6.0.25 ! 0x7fa6fe8e94
libpthread-2.27.so ! start_thread

The second call to ioctl usually is very short

ioctl
Begins: 31,4869s
Ends: 31,487s (+84,064 μs)

Call stack at 31,4869s :
libc-2.27.so ! ioctl
libnvmedia.so [2 Frames]
Nsight Systems frames
libnvsipl.so [4 Frames]
libstdc++.so.6.0.25 ! 0x7fa6fe8e94
libpthread-2.27.so ! start_thread

…but when this issue happens, it takes very long to return:

ioctl
Begins: 31,5202s
Ends: 31,8882s (+368,014 ms)

Call stack at 31,5202s :
libc-2.27.so ! ioctl
libnvmedia.so [2 Frames]
Nsight Systems frames
libnvsipl.so [4 Frames]
libstdc++.so.6.0.25 ! 0x7fa6fe8e94
libpthread-2.27.so ! start_thread

Here is a screenshot from the collected trace, with the blocking ioctl call highlighted:

Hi, @david8mzkq

Are you using one of the sony camera modules listed in DRIVE Ecosystem - Hardware and Software | NVIDIA Developer as 5.2.0 supported? Is this issue reproducible with our sample application?

We are using camera modules with the Sony IMX390 chip that are supported on DriveOS 5.2.0.

Regarding the sample applications, no we have not tried to reproduce this using the sample applications, no. I can look into doing that later.

The thread that is locking up in a ioctl call is a thread that is created by the NVidia libraries though, and it does not appear to run any user code.

I was hoping that it would be possible to diagnose the issue based on the information I have provided (especially given that it is the second ioctl call which puts the thread into uninterruptible sleep). If that is not possible, I will try to dig deeper at what exactly is happening here

I will check with our team to see if we have the observation too. At the same time, if you are able to observe it with our sample application, it will be helpful.

We haven’t seen the issue before. Can you try running the application at a higher priority than other tasks? Can you try on DRIVE OS 5.2.6? Can you provide a reproducing way for our investigation?

We are running the task with SCHED_RR realtime prio (chrt -r 40).

We are currently upgrading to 5.2.6 and I will let you know if this issue persists.

Any result can be shared? Thanks