AGX Xavier compile code slowly

Hi everyone,

I have an image stitching code which is developed using OpenCV Cuda.
When I run this code in my laptop which has Nvidia GeForce920M GPU it takes around 3.5 seconds. And I run the same code in AGX Xavier developer kit. In dev kit the running of code takes ~3.2 seconds in MAXN mode. Is it normal? Actually suppose better performance from dev kit against my laptop.

The laptop features:
8 GB RAM
Nvidia Geforce 920m gpu
intel i5 5th generation.

Agx dev kit is 32 GB.

Thanks.

1 Like

Hi,
Please execute sudo tegrastats and share the prints for reference. Would like to know the system loading when running the image stitching code. And please share your release version( $ head -1 /etc/nv_tegra_release ).

Hi DaneLLL,

The results of tegrastats when the code running:

RAM 2416/31921MB (lfb 7006x4MB) SWAP 0/15960MB (cached 0MB) CPU [4%@ 1190,2%@ 1190,3%@ 1190,3%@ 1190,1%@ 1190,1%@ 1265,0%@ 1267,0%@ 1396] EMC_FREQ 1%@ 665 GR3D_FREQ 0%@318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 4% AO@33.5C GPU@33.5C Tdiode@36.5C PMIC@100C AUX@33C CPU@35C thermal@33.75C Tboard@34C GPU 466/466 CPU 466/466 SOC 933/933 CV 0/0 VDDRQ 311/311 SYS5V 1760/1760
RAM 2416/31921MB (lfb 7006x4MB) SWAP 0/15960MB (cached 0MB) CPU [4%@ 1190,4%@ 1190,8%@ 1190,4%@ 1190,1%@ 1190,2%@ 1317,1%@ 1343,0%@ 1343] EMC_FREQ 2%@665 GR3D_FREQ 0%@318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 5% AO@33C GPU@33.5C Tdiode@36.75C PMIC@100C AUX@33C CPU@35C thermal@33.75C Tboard@34C GPU 466/466 CPU 622/544 SOC 933/933 CV 0/0 VDDRQ 311/311 SYS5V 1800/1780
RAM 2416/31921MB (lfb 7006x4MB) SWAP 0/15960MB (cached 0MB) CPU [1%@ 1190,2%@ 1190,1%@ 1190,0%@ 1190,0%@ 1190,0%@ 1265,0%@ 1267,0%@ 1483] EMC_FREQ 0%@ 2133 GR3D_FREQ 0%@ 318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 2% AO@33.5C GPU@33.5C Tdiode@36.75C PMIC@100C AUX@33.5C CPU@35C thermal@33.6C Tboard@34C GPU 466/466 CPU 466/518 SOC 1244/1036 CV 0/0 VDDRQ 311/311 SYS5V 1840/1800
RAM 2416/31921MB (lfb 7006x4MB) SWAP 0/15960MB (cached 0MB) CPU [2%@1190,0%@1190,1%@ 1190,0%@ 1190,0%@ 1190,0%@ 1321,0%@ 1343,0%@ 1524] EMC_FREQ 1%@665 GR3D_FREQ 0%@318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 3% AO@33.5C GPU@33.5C Tdiode@36.75C PMIC@100C AUX@33.5C CPU@35.5C thermal@34.1C Tboard@34C GPU 466/466 CPU 466/505 SOC 2020/1282 CV 0/0 VDDRQ 310/310 SYS5V 2240/1910
RAM 2485/31921MB (lfb 7005x4MB) SWAP 0/15960MB (cached 0MB) CPU [1%@ 2265,2%@ 2265,18%@ 2265,20%@ 2265,0%@ 2265,0%@ 2265,1%@ 2265,1%@ 2265] EMC_FREQ 2%@ 665 GR3D_FREQ 9%@ 318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 10% AO@33.5C GPU@33.5C Tdiode@36.75C PMIC@100C AUX@33.5C CPU@36C thermal@33.95C Tboard@34C GPU 466/466 CPU 1554/714 SOC 1710/1368 CV 0/0 VDDRQ 310/310 SYS5V 2120/1952
RAM 2568/31921MB (lfb 6997x4MB) SWAP 0/15960MB (cached 0MB) CPU [12%@ 2265,7%@ 2225,23%@ 1804,40%@ 1804,2%@ 1804,1%@ 1804,3%@ 1804,1%@ 1804] EMC_FREQ 1%@ 2133 GR3D_FREQ 1%@ 828 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 15% AO@33.5C GPU@34.5C Tdiode@37C PMIC@100C AUX@33.5C CPU@36C thermal@34.05C Tboard@34C GPU 1552/647 CPU 1397/828 SOC 1863/1450 CV 0/0 VDDRQ 465/336 SYS5V 2320/2013
RAM 2569/31921MB (lfb 6997x4MB) SWAP 0/15960MB (cached 0MB) CPU [3%@ 1420,4%@ 1420,21%@ 1189,28%@ 1190,1%@ 1267,0%@ 1267,1%@ 1496,1%@ 1497] EMC_FREQ 2%@ 2133 GR3D_FREQ 99%@ 1198 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 10% AO@34C GPU@35.5C Tdiode@37.25C PMIC@100C AUX@33.5C CPU@36.5C thermal@34.7C Tboard@34C GPU 3412/1042 CPU 1241/887 SOC 2482/1597 CV 0/0 VDDRQ 465/354 SYS5V 2520/2085
RAM 2572/31921MB (lfb 6997x4MB) SWAP 0/15960MB (cached 0MB) CPU [11%@ 1190,10%@ 1190,35%@ 1190,11%@ 1190,2%@ 1190,8%@ 1267,2%@ 1267,7%@ 1495] EMC_FREQ 2%@ 2133 GR3D_FREQ 1%@ 1198 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 14% AO@34C GPU@34.5C Tdiode@37.25C PMIC@100C AUX@33.5C CPU@36C thermal@34.7C Tboard@34C GPU 2017/1163 CPU 1552/970 SOC 2484/1708 CV 0/0 VDDRQ 465/368 SYS5V 2440/2130
RAM 2572/31921MB (lfb 6997x4MB) SWAP 0/15960MB (cached 0MB) CPU [3%@ 1190,2%@ 1190,0%@ 1190,0%@ 1190,0%@ 1190,0%@ 1190,0%@ 1190,0%@ 1420] EMC_FREQ 1%@ 2133 GR3D_FREQ 0%@ 1198 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 2% AO@34C GPU@34.5C Tdiode@37.25C PMIC@100C AUX@33.5C CPU@36C thermal@34.7C Tboard@34C GPU 932/1138 CPU 466/914 SOC 2330/1777 CV 0/0 VDDRQ 310/362 SYS5V 2360/2155
RAM 2572/31921MB (lfb 6997x4MB) SWAP 0/15960MB (cached 0MB) CPU [4%@ 1190,5%@ 1190,1%@ 1190,2%@ 1190,0%@ 1267,0%@ 1267,0%@ 1420,1%@ 1497] EMC_FREQ 5%@408 GR3D_FREQ 0%@318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 4% AO@34C GPU@35.5C Tdiode@37.25C PMIC@100C AUX@33.5C CPU@35.5C thermal@34.55C Tboard@34C GPU 776/1101 CPU 466/869 SOC 2176/1817 CV 0/0 VDDRQ 310/356 SYS5V 2280/2168
RAM 2418/31921MB (lfb 7003x4MB) SWAP 0/15960MB (cached 0MB) CPU [6%@ 1190,7%@ 1190,4%@ 1190,6%@ 1190,3%@ 1190,2%@ 1402,5%@ 1343,1%@ 1344] EMC_FREQ 3%@ 665 GR3D_FREQ 0%@ 318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 6% AO@33.5C GPU@34C Tdiode@37C PMIC@100C AUX@33.5C CPU@35.5C thermal@34.25C Tboard@34C GPU 466/1044 CPU 933/875 SOC 1244/1765 CV 0/0 VDDRQ 311/352 SYS5V 1960/2149
RAM 2418/31921MB (lfb 7003x4MB) SWAP 0/15960MB (cached 0MB) CPU [2%@ 1190,1%@ 1190,2%@ 1190,2%@ 1190,0%@ 1190,0%@ 1344,0%@ 1343,0%@1343] EMC_FREQ 2%@665 GR3D_FREQ 1%@318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 2% AO@34C GPU@33.5C Tdiode@37.25C PMIC@100C AUX@33.5C CPU@35.5C thermal@34.25C Tboard@34C GPU 466/995 CPU 466/841 SOC 1089/1709 CV 0/0 VDDRQ 311/349 SYS5V 1840/2123
RAM 2418/31921MB (lfb 7003x4MB) SWAP 0/15960MB (cached 0MB) CPU [15%@ 1190,8%@ 1190,10%@ 1190,7%@ 1190,1%@ 1190,1%@ 1344,2%@ 1267,1%@ 1267] EMC_FREQ 2%@ 665 GR3D_FREQ 0%@ 318 VIC_FREQ 0%@ 115 APE 150 MTS fg 0% bg 8% AO@33.5C GPU@33.5C Tdiode@37C PMIC@100C AUX@33.5C CPU@35.5C thermal@34.3C Tboard@34C GPU 466/955 CPU 622/824 SOC 933/1649 CV 0/0 VDDRQ 311/346 SYS5V 1760/2095

The release version:

R32 (release), REVISION: 5.1, GCID: 26202423, BOARD: t186ref, EABI: aarch64, DATE: Fri Feb 19 16:50:29 UTC 2021

Thanks

Hi,
It looks like GPU loading is low in most time and only at full loading once:

GR3D_FREQ 99%@ 1198

By default it is printed in interval=1000ms. You may try 50ms to get more information.

So it looks like you may not continuously utilize GPU in the code. Could you check this?

This is tegrastat logfile which was run 50ms:

tegrastatLogfile (49.2 KB)

Yes, my code uses GPU in some different parts of application. For example, I use OpenCV CUDA libs to detect and match keypoints with SURF keypoint detector and BruteForce Matching.
Also, in other parts of application I use CPU. Do you mean my laptop CPU can be better than AGX Xavier CPU architecture?
But I didn’t run tegrastat and my code at same time. I run them from different terminals. Firstly, I run tegrastat after I run the code. You might see these difference.

Detailed features of my laptop’s CPU: Intel Core i5-5200U CPU@2.20 GHz x 4.

Thanks again

Hi,
tegrastats is the tool to profile system status so please run it along with the app. To know the status while the app is running.

When comparing the CPUs, the laptop should be better than Xavier.

Hi,

How to run these two commands at same time?
When I googled this situation, all results are orderly.

Hi,
You can open two terminal windows. One is to run the app and the other is to run sudo tegrastats

Hi,
I did like it. Just I mean they didn’t start to run at same time.
The log file which I mentioned includes logs along running of the application.