Jetson NX CSI camera with abnormally high CPU load average

Hello everyone,
I’ve met some problems when using CSI camera on Jetson xavier NX(SDK version is jetson_sdk.32.4.3.nx).

Description:
1.Hardware
I used MAX9286 as GMSL deserializer, and 4 cameras as multiple input. The output was connected to NX as 4 Lane CSI-2.
2.Software
The MAX9286 driver worked as automatic mode after initialization with no spinlock or uninterruptible sleep.
The application I used, was based on jetson_multimedia_api/samples/12_camera_v4l2_cuda. I deleted the relevant parts of CUDA, only remained the initialization and capture parts, and wrote the frame data to a file after every capture.
3.Parameters
Image format: WIDTH1280 *HEIGHT 960, 25 FPS, V4L2_PIX_FMT_YUYV
NV power: 15W 6CORE

4.Phenomenon
It successfully worked, I got supposed format frames that could be displayed by some transcoding tools. While the application running, the “top” command shows a low CPU usage for my process, about 0.7% per camera linearly, but the load average is abnormally high, which was up to 1 for each camera linearly, running more than 15mins.
5.Some tests and research
(1)I’ve tried gstreamer without display and got the same result: about 1 load average per camera. The cmdline I used: gst-launch-1.0 -ev v4l2src device=/dev/video3 ! rawvideoparse format=4 width=1280 height=960 ! videoconvert ! fakesink video-sink=xvimagesink sync=false
(2)I used a USB camera as a contrast, run the same application, and they got the same CPU usage, but USB camera had a low CPU load average less than 0.1. The difference is that the USB camera can be regarded as a common USB char dev, and it’s serial.
(3)I tried,but didn’t find any uninterruptable func in V4L2 driver or by debug/tracing yet. And I’m not familiar with Nvidia platform driver(the ‘nvidia’ directory in ‘kernel_src’).
(4)Jetson Nano CPU load average for CSI video recording unusually high
I saw someone got the same problem, but I didn’t catch how he solved clearly. It’s very inconvenient for me to modify and rebuild the kernel source.

So is this my driver’s matter? Or the platform driver?
I comprehend the input calling sequence in my situation is: CSI_dev → deserializer_driver → V4L2 driver → userspace. Is that right?
I’m out of my way and need your help. Some details , application and deserializer driver src_code can be provided if necessary.

Try v4l2-ctl to confirm the CSI/VI driver’s KPI to narrow down it.

v4l2-ctl --stream-mmap -d /dev/video*

Please run multiple camera by different console and monitor the CPU usage is abnormally or not.

4 consoles run 4 cameras with "v4l2-ctl --stream-mmap -d /dev/video* ", the load average is about 4, 1 for each camera.
In my opinion, v4l2-ctl and application which uses v4l2 driver are the same API, and they got the same results for sure. The FPS diffs from 24.95 ~ 25.2 at first , and they finally are stable in 25.00 .
Best regards. Waiting for your reply.

Suppose the v4l2-ctl was the simple pipeline and it’s the best to have any improving.

Thanks

So what is the possible problem? Is there any optimization solution or have you encountered similar problems here?

Sorry the CPU usage looks like doesn’t higher to accept.
Maybe can verify on r35.1 if any improve for this new release.

So it is normal to say that the CPU is not high but the load average is high? Will a high load average have a substantial performance impact on the system?

Sorry, I don’t understand what you mean “load average is high”
Could you give more specific information.

Thanks

In this picture, running four cameras, the CPU is not significantly occupied, but the load average is very high

Hi @tiancai1234,
Sorry for the delay, if it’s the same issue as it was in my case (and I doubt they’ve fixed that bug), then you will have no other choice but to recompile the kernel.
Unfortunately for a SoC targeting the embedded/automotive market, they don’t care that much about load average.

The good news is that the issue may not have impact on your use case.
The bad news is that you’ll find some drivers using “uninterruptible sleep” instead of their cousin “interruptible sleep” and then Linux will have a kernel thread marked as U and the load average will reflect that.
The really bad news is that this issue may remain unresolved for years until support for your board is dropped.

I add this link since it has helped me at times Linux Load Averages: Solving the Mystery

Best Regards,
Juan Pablo

@ShaneCCC,
Load average is often used as an indicator of the amount of theads/processes a system is trying to run.

A linux server might be okay with load averages as high as 128 as long as there is no excessive delays on any task at the time.
On systems running time sensitive tasks a load average equal or higher than the number of available cores is often a bad indicator since tasks could be missing their deadlines

Best Regards,
Juan Pablo

1 Like

Hi @tiancai1234,
Before I forget, you might also want to check (if possible) if this issue happens on the GSML driver, the SoC driver or the Camera driver.

You can do that if you have a devboard or a custom board that allows you to isolate this parts.

Best Regards,
Juan Pablo

Maybe using taskset tools to assign the v4l2 capture to specific CPU to avoid it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.