Goodmornig;
I have a question regarding the performance of the Jetson Orin AGX.
I am working on a Python project that uses object detection algorithms, leveraging YOLOv8 models in the ONNX format.
Up until now, I have been running the algorithm on JetPack 5.1.1, and I have been trying to port it to JetPack 6.0, with the new versions of CUDA and TensorRT.
Below are the old configurations I have used:
Ubuntu (20.04)
Python 3.8
Jetpack 5.1.1 – sudo apt show nvidia-jetpack -a:
Package: nvidia-jetpack
Version: 5.1.1-b56
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 5.1.1-b56), nvidia-jetpack-dev (= 5.1.1-b56)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29,3 kB
APT-Sources: … common r35.3/main arm64 Packages
Description: NVIDIA Jetpack Meta Package
CUDA:11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2
with these version of torch and onnxruntime-gpu:
torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
onnxruntime_gpu-1.12.1-cp38-cp38-linux_aarch64.whl
Below the new configuration:
Ubuntu (22.04)
Python 3.10
Jetpack 6.0
Package: nvidia-jetpack
Version: 6.0+b106
Priority: standard
Section: metapackages
Source: nvidia-jetpack (6.0)
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 6.0+b106), nvidia-jetpack-dev (= 6.0+b106)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29,3 kB
APT-Manual-Installed: yes
APT-Sources:…/jetson/common r36.3/main arm64 Packages
Description: NVIDIA Jetpack Meta Package
Package: nvidia-jetpack
Version: 6.0+b87
Priority: standard
Section: metapackages
Source: nvidia-jetpack (6.0)
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 6.0+b87), nvidia-jetpack-dev (= 6.0+b87)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29,3 kB
APT-Sources:…/jetson/common r36.3/main arm64 Packages
Description: NVIDIA Jetpack Meta Package
CUDA:12.2.140
cuDNN: 8.9.4.25
TensorRT: 8.6.2.3
with these version of torch and onnxruntime-gpu:
torch-2.2.0a0+6a974be.nv23.11-cp310-cp310-linux_aarch64.whl
onnxruntime_gpu-1.19.0-cp310-cp310-linux_aarch64.whl
In the project, I’ve created an inference session using the same model with the following commands:
providers = [(‘TensorrtExecutionProvider’, {‘trt_fp16_enable’: True, ‘trt_engine_cache_enable’:True, ‘trt_engine_cache_path’:‘models/engines’}), (‘CUDAExecutionProvider’, {})]
self.session = ort.InferenceSession(weigths_path, providers=providers)
What I’ve noticed is that with the JetPack 5.1.1 configuration, I get better inference performance, around 35 ms, while with the JetPack 6.0 configuration, it’s about 50 ms.
I’ve observed that the Engine files created by TensorRT are different, especially in JetPack 6.0, where a “sm87” suffix appears. Do you have some reasons on what can i do to achieve at least the same performance as before?