High CPU usage streaming from CSI2 cameras on Jetson NX

We have been observing very high CPU usage for just simply streaming two CSI2 cameras on the Jetson NX. I am kicking off this thread for discussing this issue on the Jetson NX platform at the request of @JerryChang.

Background

This forum has previously discussed unexplainable and concerningly high CPU loads when using the Nvidia Argus CSI2 Deepstream plugin ( nvarguscamerasrc ). Others have experienced this with the Jetson Nano as well.

This reply aims to provide a concrete example that anyone with a Jetson device (we’re testing on Jetson NX ) can run to see the high CPU usage for themselves.

Goal

This post is seeking to understand why the overhead is so high, if it can be reduced, and if so how to reduce it. We additionally would like for everyone to be able to quickly run a couple of the same expirements we have and to observe the overhead for themselves.

Observing the Overhead

For these expirements we will use up to two IMX219-160 cameras, connected to the Jetson NX developer kit’s CSI2 ribbon cable connectors. You should configure the resolution and framerate in the expirements to values supported by your own cameras.

For a single CSI2 camera, you can run the following:

gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM), width=(int)3280, height=(int)2464, format=(string)NV12, framerate=(fraction)21/1' ! nvvidconv ! queue ! fakesink

To terminate this pipeline you can use CTRL-C in your terminal at any time to send SIGINT .

While this is running open a new terminal and observe the CPU usage of /usr/sbin/nvargus-daemon . One way to do this is to run htop (installed via sudo apt install htop ) and click on the CPU column to filter processes by order of CPU load. The nvargus-daemon process should appear at or near the top with around 18% CPU load.

An alternative means of viewing this is to run:

top -p `pgrep "nvargus"`

For two cameras you can run:

gst-launch-1.0 nvarguscamerasrc sensor-id=1 ! 'video/x-raw(memory:NVMM), width=(int)3280, height=(int)2464, format=(string)NV12, framerate=(fraction)21/1' ! nvvidconv ! queue ! fakesink nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM), width=(int)3280, height=(int)2464, format=(string)NV12, framerate=(fraction)21/1' ! nvvidconv ! queue ! fakesink

Observations

For the MODE_10W_4CORE with the clocks like so:

Online CPUs: 0-3
cpu0: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu1: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu2: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu3: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu4: Online=0 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu5: Online=0 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=1 c6=1
GPU MinFreq=114750000 MaxFreq=803250000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=0
Fan: PWM=130
NV Power Mode: MODE_10W_4CORE

We observed ~18% CPU utilization for a single camera and up to ~37% with two cameras.

Then we re-ran with the max clock frequency (using sudo jetson_clocks ) and got:

SOC family:tegra194  Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-3
cpu0: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu4: Online=0 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu5: Online=0 Governor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400 IdleStates: C1=0 c6=0
GPU MinFreq=803250000 MaxFreq=803250000 CurrentFreq=803250000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: PWM=130
NV Power Mode: MODE_10W_4CORE

We saw similar values with the other clock frequency settings.

hello bmsp,

thanks for initial a new discussion thread for tracking,

may I know what’s your actual use-case, also. what’s your expectation for the CPU resource?
thanks

Hi Jerry,

One of our use cases is to capture data from multiple cameras doing object detection, classification, and tracking using Deepstream plugins.

We are evaluating NX for our use case, and we are finding that streaming the cameras without any actual processing is already taking up a huge amount of cpu usage, whereas our expectation was that the CSI2 streaming would incur close to 0 cpu load. Especially considering the fact that most of the data receiving should be handled at the HW level. It’s unclear why there is such a high CPU overhead. The TX2 seems to only have a 3.2% overhead for comparison too.

hello bmsp,

according to Topic 161718, it’s 2592x1944 instead of 4K resolution. and it’s testing with argus_camera application.
would you please test again with argus_camera application,
for example,
$ ./argus_camera -i --kpi --module=3 --sensormode=0 --outputsize=1920x1440 -x

Hi Jerry,

The cameras I’m using support 1920x1080 (and cannot do 1920x1440), but it’s still close to the resolution you requested. I ran the following (had to remove -x in order to leave it running to profile).

argus_camera -i --kpi --module=3 --sensormode=0 --outputsize=1920x1080

Looking in htop I see the following approximate usages for nvargus-daemon:

  • 10W 4 Core (Mode 4) → 42%
  • 15W 4 Core (Mode 1) → 38%

I have run jetson_clocks to ensure the max clock speeds as well.

These results are similar to what we encountered in the original post with GStreamer.

hello bmsp,

how about running with MaxN, (i.e. nvpmodel Mode-ID=0), by setting CPU maximal frequency to 1900-MHz.
thanks

Hi Jerry,

It’s coming in around 32% CPU usage by nvargus-daemon.

Thank you

hello bmsp,

thanks, I’ll also arrange resources to check this internally.

hello bmsp,

FYI, we’re still checking this internally.
according to the test results, it shows argus_camera has higher CPU usage with less CPU cores.
this also can be seen on Jetson AGX Xavier platform, not only NX.

Hi Jerry,

Thank you! I’ll stay tuned to see what you discover.

Hi Jerry,

Just following up here. Are there any updates?

Thank you!

hello bmsp,

FYI,
I don’t see near term plans to reduce the CPU usage.

to resolve confusion,
the maximum CPU frequency of Xavier NX is 1420 MHz and we see higher CPU loading when comparing to TX2/TX2 NX (the max freq is 2035MHz).
the same scenario also seem on Nano, (the max freq of Nano is 1479MHz).

it’s necessary to compare the performance with the identical camera, they should execute at the same resolution to have comparison. since we had IMX219 to support cross the Nano and Xavier NX, we’re using IMX219 camera sensor and running on different Jetson platforms, the numbers of average CPU usage looks like below.
[Jetson Nano] 34%@1479;
[Jetson Xavier NX] 34%@1420;
thanks