TensorRT gives different results on Jetson Orin

Hi,
We’ve been using TensorRT for several years with different neural networks on different platforms, jetson (xavier), desktop (2080), server (T4), …
We’ve just started supporting Jetson Orin with our current models and we have found an odd issue.
Some networks are returning different values on Jetson Orin AGX with JetPack 5.1

I have created an example using pose estimation to reproduce the problem, you can download the code from the following link:

Steps to reproduce the problem:
1 - Install onnxruntime
$> pip3 install onnxruntime
2 - Build plan file using trtexec
$> ./trtexec_pose.sh build
3 - Run plan file and save outputs
$> ./trtexec_pose.sh infer
4 - Get results for the same input using onnxruntime
$> python3 onnxruntime_pose.py
5 - Finally compare trtexec and onnxruntime results
$> python3 check_results.py

Same steps on Jetson Xavier AGX (JetPack 4.5) or Tesla T4 or RTX 2080ti … give equivalent results when we use onnxruntime or tensorrt.

Hi,

Thanks. We are going to give it a try.
Have you also tested this on JetPack 5.1.1 which just be released recently?

Thanks.

Hi,

We have confirmed this issue can be reproduced on Orin with JetPack 5.1.1.
Our internal team is checking on this. Will update more information for you later.

--- output mismatch: (2, 28960) >>> 2.3787856101989746 vs 2.37354 | 0.005245610198974404
--- output mismatch: (2, 28962) >>> 124.85875701904297 vs 124.846 | 0.012757019042965112
--- output mismatch: (2, 28963) >>> 91.82755279541016 vs 91.7947 | 0.03285279541015029
--- output mismatch: (2, 28964) >>> 0.26270583271980286 vs 0.261643 | 0.001062832719802842
--- output mismatch: (2, 28966) >>> 0.11150957643985748 vs 0.109445 | 0.0020645764398574823
...

Thanks.

Hi,

Thanks for your patience.

We found this issue is related to tf32 kernels.
As a workaround, please run trtexec with the --noTF32 flag.

The error after applying the workaround is:

err max 0.000645141601566479 mean 3.0629781962014103e-06

Thanks.

Thank you for the workaround.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi,

Is noTF32 fixed your issue?
Thanks.