How to use full CPU & GPU Potential of Jetson AGX Xavier

Hi guys!

I am working on a five-camera parallel Object & Lane Detection and I want to improve the framerate.
Updating OpenCV and Cuda as well as using Jetson Clocks and OpenMP to multithread already worked fine. However, my CPU usage ist still “only” somewhere between 60-70% (on each kernel).

Is there a way to “lift” the overall usage to 90-100%? Or is it a Bottleneck given by Memory Access on SD Card? Object Detection on the GPU works fine as it is loaded with 50-100% all the time. Also, which other strategies besides Multithreading and Parameter fine-tuning should I consider?

Regards,
Niklas

Hi Niklas,

Have you tried to set the system to maximum performance mode by below script to see if can improve?

sudo jetson_clocks

Hi,
You may run ‘sudo tegrastats’ to get system loading. It shows all hardware engines and should give some information.

Document of tegrastats:
https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%2520Linux%2520Driver%2520Package%2520Development%2520Guide%2FAppendixTegraStats.html%23wwconnect_header

Hi

@kayccc, yes as i have mentioned above, i use jetson_clocks

@DaneLLL I already watch the system loading with the Jtop tool and with the system monitor. My Tegrastats confirm that the cpu Load is around 60%. What else information can i abstract besides that?

RAM 6928/15690MB (lfb 2x2MB) SWAP 209/7845MB (cached 0MB) CPU [56%@2265,59%@2265,63%@2265,56%@2265,56%@2265,56%@2265,55%@2265,59%@2265] EMC_FREQ 23%@2133 GR3D_FREQ 66%@1377 APE 150 MTS fg 3% bg 15% AO@30.5C GPU@34C Tboard@32C Tdiode@34.5C AUX@31.5C CPU@36.5C thermal@33.6C PMIC@100C GPU 6154/6118 CPU 7689/7826 SOC 3383/3365 CV 0/0 VDDRQ 1382/1347 SYS5V 2950/2973
RAM 6928/15690MB (lfb 2x2MB) SWAP 209/7845MB (cached 0MB) CPU [58%@2265,57%@2265,57%@2265,65%@2265,58%@2265,60%@2265,55%@2265,58%@2265] EMC_FREQ 23%@2133 GR3D_FREQ 17%@1377 APE 150 MTS fg 4% bg 12% AO@30.5C GPU@34C Tboard@32C Tdiode@34.5C AUX@31.5C CPU@36C thermal@33.6C PMIC@100C GPU 6151/6121 CPU 7843/7827 SOC 3382/3367 CV 0/0 VDDRQ 1382/1350 SYS5V 2955/2971
RAM 6928/15690MB (lfb 2x2MB) SWAP 209/7845MB (cached 0MB) CPU [58%@2265,58%@2265,58%@2265,60%@2265,56%@2265,57%@2265,57%@2265,60%@2265] EMC_FREQ 23%@2133 GR3D_FREQ 88%@1377 APE 150 MTS fg 1% bg 11% AO@30.5C GPU@34C Tboard@32C Tdiode@34.5C AUX@32C CPU@36.5C thermal@33.6C PMIC@100C GPU 6154/6124 CPU 7843/7829 SOC 3383/3368 CV 0/0 VDDRQ 1381/1353 SYS5V 2950/2969
RAM 6928/15690MB (lfb 2x2MB) SWAP 209/7845MB (cached 0MB) CPU [59%@2265,59%@2265,56%@2265,62%@2265,63%@2265,59%@2265,60%@2265,63%@2265] EMC_FREQ 23%@2133 GR3D_FREQ 94%@1377 APE 150 MTS fg 2% bg 11% AO@30.5C GPU@34C Tboard@32C Tdiode@34.25C AUX@31.5C CPU@36.5C thermal@33.75C PMIC@100C GPU 6305/6139 CPU 7997/7843 SOC 3382/3369 CV 0/0 VDDRQ 1382/1355 SYS5V 2950/2967
RAM 6929/15690MB (lfb 2x2MB) SWAP 209/7845MB (cached 0MB) CPU [60%@2265,57%@2265,64%@2265,64%@2265,61%@2265,62%@2265,50%@2265,57%@2265] EMC_FREQ 23%@2133 GR3D_FREQ 5%@1377 APE 150 MTS fg 2% bg 12% AO@30.5C GPU@33.5C Tboard@32C Tdiode@34.5C AUX@31.5C CPU@36.5C thermal@33.6C PMIC@100C GPU 6151/6140 CPU 7843/7843 SOC 3382/3370 CV 0/0 VDDRQ 1381/1357 SYS5V 2990/2969

Thanks for your fast help guys!

Hi,
Please share some information about your application. We don’t have experience in using OpenMP, but other users can see and give suggestion. Also the optimal software stacks are gstreamer and tegra_multimedia_api. You may look at the document and check if either one can be used for running your usecase. It shall bring better performance.

Documents are in
https://developer.nvidia.com/nvidia-jetson-linux-multimediaapireference
https://developer.nvidia.com/embedded/dlc/l4t-accelerated-gstreamer-guide-32-2