Hello everyone,
I’ve met some problems when using CSI camera on Jetson xavier NX(SDK version is jetson_sdk.32.4.3.nx).
Description:
1.Hardware
I used MAX9286 as GMSL deserializer, and 4 cameras as multiple input. The output was connected to NX as 4 Lane CSI-2.
2.Software
The MAX9286 driver worked as automatic mode after initialization with no spinlock or uninterruptible sleep.
The application I used, was based on jetson_multimedia_api/samples/12_camera_v4l2_cuda. I deleted the relevant parts of CUDA, only remained the initialization and capture parts, and wrote the frame data to a file after every capture.
3.Parameters
Image format: WIDTH1280 *HEIGHT 960, 25 FPS, V4L2_PIX_FMT_YUYV
NV power: 15W 6CORE
4.Phenomenon
It successfully worked, I got supposed format frames that could be displayed by some transcoding tools. While the application running, the “top” command shows a low CPU usage for my process, about 0.7% per camera linearly, but the load average is abnormally high, which was up to 1 for each camera linearly, running more than 15mins.
5.Some tests and research
(1)I’ve tried gstreamer without display and got the same result: about 1 load average per camera. The cmdline I used: gst-launch-1.0 -ev v4l2src device=/dev/video3 ! rawvideoparse format=4 width=1280 height=960 ! videoconvert ! fakesink video-sink=xvimagesink sync=false
(2)I used a USB camera as a contrast, run the same application, and they got the same CPU usage, but USB camera had a low CPU load average less than 0.1. The difference is that the USB camera can be regarded as a common USB char dev, and it’s serial.
(3)I tried,but didn’t find any uninterruptable func in V4L2 driver or by debug/tracing yet. And I’m not familiar with Nvidia platform driver(the ‘nvidia’ directory in ‘kernel_src’).
(4)Jetson Nano CPU load average for CSI video recording unusually high
I saw someone got the same problem, but I didn’t catch how he solved clearly. It’s very inconvenient for me to modify and rebuild the kernel source.
So is this my driver’s matter? Or the platform driver?
I comprehend the input calling sequence in my situation is: CSI_dev → deserializer_driver → V4L2 driver → userspace. Is that right?
I’m out of my way and need your help. Some details , application and deserializer driver src_code can be provided if necessary.
4 consoles run 4 cameras with "v4l2-ctl --stream-mmap -d /dev/video* ", the load average is about 4, 1 for each camera.
In my opinion, v4l2-ctl and application which uses v4l2 driver are the same API, and they got the same results for sure. The FPS diffs from 24.95 ~ 25.2 at first , and they finally are stable in 25.00 .
Best regards. Waiting for your reply.
So it is normal to say that the CPU is not high but the load average is high? Will a high load average have a substantial performance impact on the system?
Hi @tiancai1234,
Sorry for the delay, if it’s the same issue as it was in my case (and I doubt they’ve fixed that bug), then you will have no other choice but to recompile the kernel.
Unfortunately for a SoC targeting the embedded/automotive market, they don’t care that much about load average.
The good news is that the issue may not have impact on your use case.
The bad news is that you’ll find some drivers using “uninterruptible sleep” instead of their cousin “interruptible sleep” and then Linux will have a kernel thread marked as U and the load average will reflect that.
The really bad news is that this issue may remain unresolved for years until support for your board is dropped.
@ShaneCCC,
Load average is often used as an indicator of the amount of theads/processes a system is trying to run.
A linux server might be okay with load averages as high as 128 as long as there is no excessive delays on any task at the time.
On systems running time sensitive tasks a load average equal or higher than the number of available cores is often a bad indicator since tasks could be missing their deadlines
Hi @tiancai1234,
Before I forget, you might also want to check (if possible) if this issue happens on the GSML driver, the SoC driver or the Camera driver.
You can do that if you have a devboard or a custom board that allows you to isolate this parts.