Hello NVIDIA team,
I would like to report a reproducible TensorRT accuracy issue across two NVIDIA platforms.
I converted the same VGGT model using the following pipeline:
PyTorch → ONNX → TensorRT engine
The issue is that the inference results are platform-dependent:
So this is not a build failure, but rather an inference accuracy mismatch on Jetson Orin NX only.
At the moment, this looks like a platform-specific TensorRT issue related to one or more of the following:
-
TensorRT / CUDA / cuDNN stack differences between JetPack 6.2 and JetPack 7.0
-
GPU architecture differences (Ampere vs Grace Blackwell)
-
tactic selection
-
precision handling
-
unsupported or numerically unstable kernel path on Jetson
I have provided the issue reproduction details, testing codes, both ONNX models and TensorRT engines, input/output dumps, and build logs through a link as follow: https://ifbs-my.sharepoint.com/:f:/g/personal/k_huang_innofaith_com/IgDpzC53KfShSK2K8Az7yvT8Ab7YJwRAJloHV2x7wPAwu4Y?e=1lo6wZ
Please let me if there any need for further information. Thank you!
Hi,
Thanks for sharing the files.
We can download the data successfully and are now working on reproducing this issue internally.
Will get back to you once there is any progress.
Thanks.
Hi,
Confirmed that we can reproduce this issue internally.
Running the model with FP32 mode (on AGX Orin) also reproduces the accuracy drop.
We need to check this issue with our internal team.
Will get back to you once we got further update.
Thanks.
I split the full model into four components, as shown in the graph above, and ran each part separately through Polygraphy. From this, I was able to identify a clear numerical distribution mismatch at the output of the aggregator component, rather than in the heads.
I have updated all related files in the OneDrive link above under the folder named “sub-onnx”.
I hope this helps further narrow down the issue. In the meantime, I will continue trying to localize the problem more precisely within the aggregator or possibly within DINO.
Thank you very much for your support.
Hi,
Thanks for sharing this.
We try to open the OneDrive link but meet the following error:
Selected user account does not exist in tenant '***' and cannot access the application '***' in that tenant. The account needs to be added as an external user in the tenant first. Please use a different account.
Could you try enable the access public access?
Thanks.
Hi,
Confirmed that we can access the files with the new link.
We will share this information with our internal team.
Thanks.
Hi,
Thanks a lot for your patience.
We have confirmed that this issue is fixed after upgrading the TensorRT into v10.16.
Below are the steps for your reference:
CUDA
$ mkdir -p ~/cuda-compat-orin && cd ~/cuda-compat-orin
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/cuda-compat-orin-13-2_13.2.44290101-1_arm64.deb -O cuda-compat-orin-13-2.deb
$ dpkg-deb -x cuda-compat-orin-13-2.deb extracted/
$ export LD_LIBRARY_PATH=/home/nvidia/cuda-compat-orin/extracted/usr/local/cuda-13.2/compat_orin${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
TensorRT
$ wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.16.1/tars/TensorRT-10.16.1.11.Linux.aarch64-gnu.cuda-13.2.tar.gz
$ tar -xzf TensorRT-10.16.1.11.Linux.aarch64-gnu.cuda-13.2.tar.gz
$ pip install numpy pycuda pillow TensorRT-10.16.1.11/python/tensorrt-10*-cp310-*.whl
Run
$ LD_LIBRARY_PATH=TensorRT-10.16.1.11/lib:cuda-compat-orin/extracted/usr/local/cuda-13.2/compat_orin/:$LD_LIBRARY_PATH TensorRT-10.16.1.11/bin/trtexec --onnx=vggt_fp16.onnx --saveEngine=vggt_fp16.engine --minShapes=input_images:4x3x518x518 --optShapes=input_images:4x3x518x518 --maxShapes=input_images:4x3x518x518 --fp16
$ LD_LIBRARY_PATH=TensorRT-10.16.1.11/lib:cuda-compat-orin/extracted/usr/local/cuda-13.2/compat_orin/:$LD_LIBRARY_PATH python test_engine.py --engine vggt_fp16.engine --input_dir custom_data --output_dir output_pcd_trt
Please note that TensorRT 10.16 & CUDA 13.2 will be official avaiable with JetPack 7.2.
Thanks.
Hi,
Thanks for your update!
May I ask what’s hardware spec for running model conversion (ONNX to TensorRT engine) and what’s the memory consumption? If I eventually want to deploy the engine on Jetson Orin Nano 8GB, this information will help me a lot.
Thank you very much!
Best,
Kai
Hi,
We test this on Orin NX with JetPack 6.2.2.
Thanks.
Great. Thank you very much!