nvcamera-daemon ARM load

Hi,

I know nvcamera-daemon is used to handle the capture side if you want to pass the data through the ISP [1], we have been able to create V4L2 drivers that work with it and in general it is pretty stable and works pretty well. My question goes more about the performance of the daemon, it consumes a lot, just to capture 50% of one core for instance.

 I read in some other post that this is because the daemon does several computations but I was wondering if NVIDIA is also doing memory copies, if so, is there someway to avoid them? It would be nice to launch the daemon in a low-power-consumption mode to avoid some of those computations or copies being aware that maybe the quality of the image won't be the best or that some statistics won't be produced but at least the user could choose between the full-mode or the low-power-consumption mode. We can help doing this changes if needed. 

Are there any plans to support such a low-power-consumption mode? Is there some way to disable some of those computations?

Thanks,
-David

[1] https://devtalk.nvidia.com/default/topic/934354/jetson-tx1/typical-approaches-to-test-camera-functionality-for-l4t-r23-2-on-jetson-tx1/post/4890615/#4890615

Hi DavidSoto,
The nvcamera-daemon does not have extra memcpy. The main loading comes from socket communication. If you set ‘enable-exif=true’ or ‘enable-meta=true’, there will be some extra loading in copying metadata.

Please share how you profile to get ‘50% of one core’. Is the core at max clock?

Hi DaneLLL,

We are not using enable-exif or meta. You can see the pipeline and how we measured the ARM load:

basically it is using tegrastats, I am not running the max clocks script, but still with other SoC that are less powerful we have seen that only capturing frames shouldn’t load an ARM like this up to 50% in one core. We would like to help to optimize it.

-David

Hi DavidSoto,
The loading seems fine since CPUs run at low frequency 204-307MHz

RAM 1162/3995MB (lfb 268x4MB) cpu [20%,30%,30%,25%]@307 EMC 11%@665 AVP 20%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 1162/3995MB (lfb 268x4MB) cpu [25%,19%,32%,23%]@204 EMC 11%@665 AVP 20%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 1162/3995MB (lfb 268x4MB) cpu [19%,21%,33%,17%]@204 EMC 11%@665 AVP 20%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 1162/3995MB (lfb 268x4MB) cpu [45%,17%,10%,16%]@307 EMC 11%@665 AVP 20%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 1162/3995MB (lfb 268x4MB) cpu [11%,24%,38%,15%]@204 EMC 11%@665 AVP 20%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 1162/3995MB (lfb 268x4MB) cpu [29%,14%,21%,13%]@204 EMC 11%@665 AVP 31%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734

Please run ‘sudo ./tegrastats’ to get all information.

Hi DaneLLL,

In your output for instance, the first row show 20+30+30+25=105%, that is one single core completely loaded. That sounds weird, isn’t it?

-David

Hi David,
Please refer to Appendix -> Tegra Stats Utility in document.

It means
cpu0 loading 20%@307MHz
cpu1 loading 30%@307MHz
cpu2 loading 30%@307MHz
cpu3 loading 25%@307MHz

Hi DaneLLL,

Yes, that is my point, having the load distributed is almost the same to have in time0 one core 105% loaded, correct? How can we help to reduce that?

-David

Hi David,
Sine each core can go to 1.7GHz, I think it is in average 15% loaded one core.

RAM 1149/3995MB (lfb 221x4MB) cpu [4%,0%,2%,4%]@1734 EMC 4%@1600 AVP 65%@12 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1149/3995MB (lfb 221x4MB) cpu [0%,0%,3%,7%]@1734 EMC 4%@1600 AVP 65%@12 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1149/3995MB (lfb 221x4MB) cpu [2%,2%,7%,2%]@1734 EMC 4%@1600 AVP 65%@12 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1149/3995MB (lfb 221x4MB) cpu [4%,0%,6%,1%]@1734 EMC 4%@1600 AVP 65%@12 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1149/3995MB (lfb 221x4MB) cpu [9%,6%,0%,1%]@1734 EMC 4%@1600 AVP 65%@12 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734

I’m having the same problem, but after running the max clock script I still have a very high CPU consumption:

RAM 844/3995MB (lfb 659x4MB) cpu [23%,19%,29%,38%]@1734 EMC 41%@204 AVP 27%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734

I’m using the same gstreamer pipeline DavidSoto mentioned. Any word on why the CPU consumption would be so high?

We have confirmed it is ~15% one core CPU load on r24.2.1 with onboard camera. Are you on r24.2.1?

Yes, this is 24.2.1. I’m using a tx1 on an Auvidea J90 board. I did notice that the EMC clock appears to be slow, could that have something to do with it?

Hi Allanm, what is your sensor? Please try to run ‘nvcamerasrc ! fakesink’ and check the tegrastats.

I’m using an IMX219 using patches from RidgeRun. Running that pipeline I get:

RAM 814/3995MB (lfb 596x4MB) cpu [26%,100%,19%,8%]@1734 GR3D 0%@998 EDP limit 0
RAM 814/3995MB (lfb 596x4MB) cpu [24%,100%,17%,13%]@1734 GR3D 0%@998 EDP limit 0
RAM 814/3995MB (lfb 596x4MB) cpu [19%,100%,24%,14%]@1734 GR3D 0%@998 EDP limit 0
RAM 814/3995MB (lfb 596x4MB) cpu [21%,100%,16%,20%]@1734 GR3D 0%@998 EDP limit 0
RAM 814/3995MB (lfb 596x4MB) cpu [21%,100%,14%,18%]@1734 GR3D 0%@998 EDP limit 0
RAM 814/3995MB (lfb 596x4MB) cpu [9%,100%,13%,26%]@1734 GR3D 0%@998 EDP limit 0
RAM 814/3995MB (lfb 596x4MB) cpu [17%,100%,9%,19%]@1734 GR3D 0%@998 EDP limit 0
RAM 770/3995MB (lfb 599x4MB) cpu [23%,93%,22%,11%]@1734 GR3D 0%@998 EDP limit 0

Hi Allanm, can you get help from RidgeRun? The CPU usage is very different when comparing to devkit + onboard_ov5693.

Hi Allanm,

Please send me an email to support@ridgerun.com, describing the resolution that you are using for testing and the amount of cameras, please share the pipeline that you are using.

DaneLLL, when you ran the test, which resolution and framerate did you use?

-David

Hi David, please refer to

ubuntu@tegra-ubuntu:~$ gst-launch-1.0 nvcamerasrc num-buffers=600 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvoverlaysink & sudo ./tegrastats
[1] 2303
RAM 1046/3995MB (lfb 554x4MB) cpu [0%,0%,0%,0%]@1734 EMC 2%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
Setting pipeline to PAUSED ...
Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit set DPB and MjstreamingInside NvxLiteH265DecoderLowLatencyInitNvxLiteH265DecoderLowLatencyInit set DPB and Mjstreaming
Available Sensor modes :
2592 x 1944 FR=30.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock

NvCameraSrc: Trying To Set Default Camera Resolution. Selected 1920x1080 FrameRate = 30.000000 ...

RAM 1095/3995MB (lfb 548x4MB) cpu [25%,20%,12%,5%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1094/3995MB (lfb 548x4MB) cpu [5%,4%,0%,6%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1094/3995MB (lfb 548x4MB) cpu [5%,2%,0%,6%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1094/3995MB (lfb 548x4MB) cpu [2%,16%,0%,1%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1094/3995MB (lfb 548x4MB) cpu [5%,9%,1%,7%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1094/3995MB (lfb 548x4MB) cpu [1%,8%,0%,9%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [5%,8%,6%,4%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [1%,13%,6%,7%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [4%,10%,6%,14%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [5%,7%,2%,8%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [10%,4%,7%,7%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [5%,1%,8%,12%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [8%,1%,6%,7%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [6%,5%,2%,6%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [1%,5%,6%,11%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [0%,0%,5%,13%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [3%,6%,1%,11%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [0%,5%,7%,7%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [1%,8%,5%,10%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1095/3995MB (lfb 548x4MB) cpu [0%,9%,3%,10%]@1734 EMC 4%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
Got EOS from element "pipeline0".
Execution ended after 0:00:20.163733318
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...
RAM 1062/3995MB (lfb 548x4MB) cpu [4%,3%,5%,6%]@1734 EMC 5%@1600 AVP 4%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1062/3995MB (lfb 548x4MB) cpu [0%,0%,0%,0%]@1734 EMC 3%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1061/3995MB (lfb 548x4MB) cpu [0%,0%,0%,0%]@1734 EMC 2%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 1061/3995MB (lfb 548x4MB) cpu [0%,0%,1%,0%]@1734 EMC 2%@1600 AVP 3%@80 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
^C
[1]+  Done                    gst-launch-1.0 nvcamerasrc num-buffers=600 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! nvoverlaysink

It is on r24.2.1, devkit+onboard_ov5693

Please refer to http://developer.ridgerun.com/wiki/index.php?title=Imx219_vs_ov5693_armload to know the arm load comparison between IMX219 and Ov5693