Perfomances drop after AGX Orin update

Goodmornig;
I have a question regarding the performance of the Jetson Orin AGX.
I am working on a Python project that uses object detection algorithms, leveraging YOLOv8 models in the ONNX format.
Up until now, I have been running the algorithm on JetPack 5.1.1, and I have been trying to port it to JetPack 6.0, with the new versions of CUDA and TensorRT.


Below are the old configurations I have used:

Ubuntu (20.04)
Python 3.8
Jetpack 5.1.1 – sudo apt show nvidia-jetpack -a:

Package: nvidia-jetpack
Version: 5.1.1-b56
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 5.1.1-b56), nvidia-jetpack-dev (= 5.1.1-b56)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29,3 kB
APT-Sources: … common r35.3/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

CUDA:11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2

with these version of torch and onnxruntime-gpu:

torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
onnxruntime_gpu-1.12.1-cp38-cp38-linux_aarch64.whl


Below the new configuration:
Ubuntu (22.04)
Python 3.10
Jetpack 6.0

Package: nvidia-jetpack
Version: 6.0+b106
Priority: standard
Section: metapackages
Source: nvidia-jetpack (6.0)
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 6.0+b106), nvidia-jetpack-dev (= 6.0+b106)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29,3 kB
APT-Manual-Installed: yes
APT-Sources:…/jetson/common r36.3/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Package: nvidia-jetpack
Version: 6.0+b87
Priority: standard
Section: metapackages
Source: nvidia-jetpack (6.0)
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 6.0+b87), nvidia-jetpack-dev (= 6.0+b87)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29,3 kB
APT-Sources:…/jetson/common r36.3/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

CUDA:12.2.140
cuDNN: 8.9.4.25
TensorRT: 8.6.2.3

with these version of torch and onnxruntime-gpu:

torch-2.2.0a0+6a974be.nv23.11-cp310-cp310-linux_aarch64.whl
onnxruntime_gpu-1.19.0-cp310-cp310-linux_aarch64.whl

In the project, I’ve created an inference session using the same model with the following commands:

providers = [(‘TensorrtExecutionProvider’, {‘trt_fp16_enable’: True, ‘trt_engine_cache_enable’:True, ‘trt_engine_cache_path’:‘models/engines’}), (‘CUDAExecutionProvider’, {})]
self.session = ort.InferenceSession(weigths_path, providers=providers)

What I’ve noticed is that with the JetPack 5.1.1 configuration, I get better inference performance, around 35 ms, while with the JetPack 6.0 configuration, it’s about 50 ms.

I’ve observed that the Engine files created by TensorRT are different, especially in JetPack 6.0, where a “sm87” suffix appears. Do you have some reasons on what can i do to achieve at least the same performance as before?

Hi,

Have you maximized the performance before testing?
This can be done via:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

More, could you double-check if the GPU utilization reaches maximal (~99%) when both models?
Thanks.

Yes, I maximized the performance on both devices.
I ran jetson power gui, and gpu reach on both devices 99% usage in some istants (not constantly).

The big difference I noticed between the two versions is about EMC:

in jetpack 6.0 I have a frequency of ~204 MHz with 142% load

in Jetpack 5.1.1 I have a frequency of ~2133 MHz with 15% load

It’s normal?

And the “sm87” suffix on Jetapck 6.0? Could this be the cause of the performance drop?

thank you

Hi,

Could you double-check the value with tegrastats?

$ sudo tegrastats

The EMC clock of Orin with r36.3.0 (JetPack 6.0) should be 3199:

We also observe the same in our Orin board with JetPack 6.0:

nvidia@tegra-ubuntu:~$ cat /etc/nv_tegra_release
# R36 (release), REVISION: 3.0, GCID: 36923193, BOARD: generic, EABI: aarch64, DATE: Fri Jul 19 23:24:25 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
nvidia@tegra-ubuntu:~$ sudo tegrastats
03-24-2025 07:41:50 RAM 15522/62841MB (lfb 1927x4MB) SWAP 787/31421MB (cached 2MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,1%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 58%@3199 GR3D_FREQ 99%@[1290,1289] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@74.906C tboard@61.75C soc2@71.656C tdiode@70.125C soc0@72.437C gpu@74.718C tj@74.718C soc1@71.343C VDD_GPU_SOC 45592mW/45592mW VDD_CPU_CV 376mW/376mW VIN_SYS_5V0 10718mW/10718mW VDDQ_VDD2_1V8AO 4629mW/4629mW
03-24-2025 07:41:51 RAM 15521/62841MB (lfb 1927x4MB) SWAP 787/31421MB (cached 2MB) CPU [0%@729,0%@729,2%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 59%@3199 GR3D_FREQ 99%@[1293,1294] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@74.843C tboard@61.75C soc2@71.718C tdiode@70.25C soc0@72.437C gpu@74.687C tj@74.843C soc1@71.281C VDD_GPU_SOC 45969mW/45780mW VDD_CPU_CV 376mW/376mW VIN_SYS_5V0 10819mW/10768mW VDDQ_VDD2_1V8AO 4629mW/4629mW
03-24-2025 07:41:52 RAM 15522/62841MB (lfb 1927x4MB) SWAP 787/31421MB (cached 2MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,1%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 59%@3199 GR3D_FREQ 99%@[1297,1298] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@74.843C tboard@61.75C soc2@71.843C tdiode@70.25C soc0@72.437C gpu@75.343C tj@75.343C soc1@71.406C VDD_GPU_SOC 45969mW/45843mW VDD_CPU_CV 376mW/376mW VIN_SYS_5V0 10819mW/10785mW VDDQ_VDD2_1V8AO 4629mW/4629mW
03-24-2025 07:41:53 RAM 15521/62841MB (lfb 1927x4MB) SWAP 787/31421MB (cached 2MB) CPU [0%@729,0%@729,1%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 59%@3199 GR3D_FREQ 96%@[1300,1294] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@74.656C tboard@61.75C soc2@71.781C tdiode@70.25C soc0@72.593C gpu@74.375C tj@74.781C soc1@71.406C VDD_GPU_SOC 45216mW/45686mW VDD_CPU_CV 377mW/376mW VIN_SYS_5V0 10617mW/10743mW VDDQ_VDD2_1V8AO 4536mW/4605mW
03-24-2025 07:41:54 RAM 15521/62841MB (lfb 1927x4MB) SWAP 787/31421MB (cached 2MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 58%@3199 GR3D_FREQ 99%@[1296,1297] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@74.968C tboard@61.75C soc2@71.937C tdiode@70.25C soc0@72.656C gpu@76.625C tj@76.375C soc1@71.406C VDD_GPU_SOC 46346mW/45818mW VDD_CPU_CV 376mW/376mW VIN_SYS_5V0 10617mW/10718mW VDDQ_VDD2_1V8AO 4536mW/4591mW

Could you share how you setup your device with JetPack 6.0?
Do you flash it with the SDK manager?

Thanks.

Hi,

theese are the results:

$ cat /etc/nv_tegra_release

# R36 (release), REVISION: 3.0, GCID: 36923193, BOARD: generic, EABI: aarch64, DATE: Fri Jul 19 23:24:25 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
$ sudo tegrastats

03-24-2025 14:55:23 RAM 2108/30697MB (lfb 2x4MB) SWAP 0/15349MB (cached 0MB) CPU [2%@2201,1%@729,0%@729,1%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 2%@665 GR3D_FREQ 1%@[305,305] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@44.156C tboard@33.625C soc2@41.531C tdiode@34.75C soc0@41.875C gpu@40.25C tj@44.312C soc1@40.562C VDD_GPU_SOC 2782mW/2782mW VDD_CPU_CV 795mW/795mW VIN_SYS_5V0 3310mW/3310mW VDDQ_VDD2_1V8AO 702mW/702mW
03-24-2025 14:55:24 RAM 2108/30697MB (lfb 2x4MB) SWAP 0/15349MB (cached 0MB) CPU [6%@2201,10%@729,0%@729,7%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 2%@665 GR3D_FREQ 0%@[305,305] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@44.281C tboard@33.625C soc2@41.562C tdiode@34.75C soc0@42C gpu@40.062C tj@44.281C soc1@40.5C VDD_GPU_SOC 2782mW/2782mW VDD_CPU_CV 795mW/795mW VIN_SYS_5V0 3511mW/3410mW VDDQ_VDD2_1V8AO 702mW/702mW
03-24-2025 14:55:25 RAM 2109/30697MB (lfb 2x4MB) SWAP 0/15349MB (cached 0MB) CPU [4%@729,5%@729,5%@729,5%@729,0%@729,0%@729,3%@729,0%@729,2%@729,0%@729,2%@729,0%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[305,305] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@44.281C tboard@33.625C soc2@41.593C tdiode@34.875C soc0@41.937C gpu@40.343C tj@44.437C soc1@40.562C VDD_GPU_SOC 2782mW/2782mW VDD_CPU_CV 795mW/795mW VIN_SYS_5V0 3511mW/3444mW VDDQ_VDD2_1V8AO 802mW/735mW
03-24-2025 14:55:26 RAM 2108/30697MB (lfb 2x4MB) SWAP 0/15349MB (cached 0MB) CPU [5%@729,14%@729,5%@729,4%@729,0%@908,1%@729,0%@729,0%@729,2%@729,1%@729,0%@729,0%@729] EMC_FREQ 2%@665 GR3D_FREQ 0%@[305,305] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@44.25C tboard@33.625C soc2@41.562C tdiode@34.875C soc0@41.968C gpu@40.062C tj@44.25C soc1@40.687C VDD_GPU_SOC 2782mW/2782mW VDD_CPU_CV 795mW/795mW VIN_SYS_5V0 3410mW/3435mW VDDQ_VDD2_1V8AO 702mW/727mW
03-24-2025 14:55:27 RAM 2109/30697MB (lfb 2x4MB) SWAP 0/15349MB (cached 0MB) CPU [1%@729,2%@729,4%@729,5%@729,1%@1420,0%@729,2%@729,0%@729,0%@729,6%@729,1%@729,0%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[305,305] NVENC off NVDEC off NVJPG off NVJPG1 off VIC off OFA off NVDLA0 off NVDLA1 off PVA0_FREQ off APE 174 cpu@44.312C tboard@33.625C soc2@41.625C tdiode@34.875C soc0@41.937C gpu@40C tj@44.312C soc1@40.625C VDD_GPU_SOC 2782mW/2782mW VDD_CPU_CV 795mW/795mW VIN_SYS_5V0 3410mW/3430mW VDDQ_VDD2_1V8AO 702mW/722mW

Yes, I used the latest version of SDK Manager on a virtual machine with Ubuntu 20.04.

what do you mean with " how you setup your device with JetPack 6.0?"

thanks for your help

Hi,

The tegrastats output indicates the device is still in the dynamic frequency.

Could you try to run the below commands again?
Please note that these commands need to be executed in order.
Please set up the nvpmodel to mode 0 and then fix the clock to the maximum with jetson_clocks.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

“SDK Manager” is what we want to know for the " how you setup your device with JetPack 6.0?".

Thanks.

I managed to achieve the same performance again.

The two commands you provided need to be executed in sequence and at every startup (I created a Linux service to handle this for me).

I also tested the performance of my scripts with YOLO networks, and now the performance is comparable.

Thank you very much! I really appreciate your help!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.