Performance of nvargus-daemon

Hello,
we are using the TX2 with R32.2.1 on a customer carrier board. The main application is to capture images from up to 6 image sensors (IMX290 1920x1080 30fps) connectd via CSI using the nvargus-daemon by the Multimedia API.

As we are now in the phase of optimizing the system it turned out that the nvargus-daemon consumes a lot of system performance.

As the camera images need to be further processed by OpenCV after ‘acquireFrame’ the yuv420 image is getting color converted (ARGB32) and lifted into cpu memory with the NvConverter.

The performance needed for one camera image is around 20% and scales with the number of processed images.
Using the NVConverter even introduces a non deterministic frame latency of 4~20ms!

%Cpu0 : 44.5 us, 46.5 sy, 0.0 ni, 5.7 id, 0.0 wa, 3.0 hi, 0.3 si, 0.0 st
%Cpu3 : 54.0 us, 40.6 sy, 0.0 ni, 5.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu4 : 55.1 us, 39.5 sy, 0.0 ni, 4.7 id, 0.0 wa, 0.7 hi, 0.0 si, 0.0 st
%Cpu5 : 57.0 us, 38.0 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
MiB Mem : 7871.1 total, 4465.4 free, 3200.1 used, 205.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5564.7 avail Mem

4607 root 20 0 2644120 488096 364456 R 125.2 6.1 2:11.44 handle_cams_4
3743 root 20 0 18.7g 534972 48812 S 121.6 6.6 6:29.43 nvargus-daemon
4728 root 20 0 1522856 268424 197172 R 81.4 3.3 0:15.38 handle_cams_2

Question:
1.) Is there a more efficient way of converting the captured images
2.) Is there a way around the double color conversion used for OpenCV (YUV420->ARGB32 ->RGB)
3.) What is the experience by implementing an own cuda color conversion kernel and what is the expected performance impact? Is this approach working for multi process applications?

And last: What exactly is the nvargus-daemon doing to waste such a huge amount of system performance?
Is it a soft ISP not running on a dedicated cpu but on the cpu’s intended for customer use?

Hi,
Please check if all sources can reach the framerate in running

$ gst-launch-1.0 nvarguscamerasrc maxperf=1 ! 'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080' ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false -v

If yes, please try

$ gst-launch-1.0 nvarguscamerasrc maxperf=1 ! 'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080' ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false -v

OpenCV takes video/x-raw,format=BGR in appsink. And we don’t support BGR in hardware converter engine. Please check

So for hooking with OpenCV, you need to do

nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR

This takes certain CPU loading.

Or you may run

to call cv2.cvtColor(img, cv2.COLOR_YUV2BGR_I420) for format conversion.

Hello DaneLLL,
thank you for your reply. We are perfectly clear on how to do color conversion or get frames into OpenCV. As i told you we have a well working system. The only point is the resource usage of the nvargus-daemon where i try to get a clarification about the system load and if there is a way to increase the performance.

Hi,
nvargus-daemon is to get hardware DMA buffers from ISP engine and pass the buffers to upper application. Should not take much CPU usage. You can check by running

  1. $ sudo jetson_clcoks
  2. $ gst-launch-1.0 nvarguscamerasrc ! ‘video/x-raw(memory:NVMM)’ ! fakesink
  3. $ sudo tegrastats

It shows CPU usage at fixed frequency in tegrastats.

Hi,
For your reference, on Xavier/r32.3.1, we have seen 11% CPU usage for 6-cam preview; roughly 2% CPU usage per camera:

  1. Test command:
    sudo nvpmodel -m 0 sudo jetson_clocks
    $ ./argus_camera -d 0 & (-d 1, 2, 3 4, 5)
  2. Using top and switching Irix mode off (shift+i) to get average CPU usage.

This eliminates other factors and show the usage taken by Argus.

Hi,
i verified your numbers on our system by using the argus with and without color conversion

no color conversion:

gst-launch-1.0 nvarguscamerasrc sensor-id=2 ! ‘video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12’ ! fakesink

irix mode off
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2761 root 20 0 18.0g 161052 40084 S 2.3 2.0 0:11.86 nvargus-daemon
3399 root 20 0 740808 24980 18352 S 0.6 0.3 0:02.85 gst-launch-1.0 n

irix mode on
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2761 root 20 0 19.1g 453412 46900 S 14.2 5.6 14:16.53 nvargus-daemon
4344 root 20 0 740836 24776 18216 S 4.0 0.3 0:00.32 gst-launch-1.0

activated color conversion

gst-launch-1.0 nvarguscamerasrc sensor-id=2 ! ‘video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12’ ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! fakesink

irix mode off
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3598 root 20 0 761464 46400 19268 S 4.5 0.6 0:16.61 gst-launch-1.0
2761 root 20 0 18.1g 214796 46860 S 2.5 2.7 0:37.86 nvargus-daemon

irix mode on
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4038 root 20 0 761468 46204 19068 S 27.2 0.6 0:17.65 gst-launch-1.0
2761 root 20 0 19.1g 448016 46860 S 14.2 5.6 12:10.11 nvargus-daemon

I still see a cpu usage of around 14% cpu for each camera stream (irix mode on) for the argus daemon which scales with the number of cameras used.

Hi,
Please check difference of Irix mode on/off:


On multi-core CPU system, Irix mode off looks more significant. Or you may run tegrastats to get the loading of each core.