Onnx to TensorRT mismatch

alon2 · December 12, 2023, 12:49pm

We are facing a challenge with TensorRT on the NVIDIA Orin NX platform. Our team has encountered an output mismatch issue when converting models from ONNX to TensorRT, despite using TRT 32-bit where we don’t anticipate accuracy discrepancies.

Here are the details of our situation:

Hardware: NVIDIA Orin Dev kit, running as Orin NX 16GB
Software: We are utilizing the latest compatible versions of ONNX and TensorRT.

Issue Description: Post conversion of our model from ONNX to TensorRT, we observed a mismatch in the outputs. This is particularly concerning as it directly impacts the accuracy and reliability of our model’s predictions.
Steps Taken: To narrow down the issue, we employed Polygraphy and successfully pinpointed the minimal graph responsible for the problem. Additionally, we have been using TRT 32-bit in our process, under the assumption that it would mitigate any potential accuracy issues typically associated with the conversion from ONNX to TensorRT.
Reproduce: run polygraphy run initial_reduced.onnx --trt --onnxrt with the attached model

Our objective is to achieve a precise and consistent conversion of our models from ONNX to TensorRT, without facing output accuracy issues. I am seeking insights, advice, or similar experiences from the community regarding this matter.

If anyone has faced a similar situation or has suggestions on troubleshooting methods, configuration adjustments, or updates that might aid in resolving this issue, your input would be highly valued.

Thanks!

Extra information:

ploygraphy log

>polygraphy run initial_reduced.onnx --trt --onnxrt [I] RUNNING | Command: /root/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt [I] trt-runner-N0-12/12/23-10:08:30 | Activating and starting inference [W] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [I] Configuring with profiles:[ Profile 0: {/model.4/cv1/conv/Conv_output_0 [min=[1, 64, 24, 180], opt=[1, 64, 24, 180], max=[1, 64, 24, 180]]} ] [I] Building engine with configuration: Flags | [] Engine Capability | EngineCapability.DEFAULT Memory Pools | [WORKSPACE: 15824.66 MiB] Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS] Profiling Verbosity | ProfilingVerbosity.DETAILED [I] Finished engine building in 51.652 seconds [I] trt-runner-N0-12/12/23-10:08:30 ---- Inference Input(s) ---- {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]} [I] trt-runner-N0-12/12/23-10:08:30 ---- Inference Output(s) ---- {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]} [I] trt-runner-N0-12/12/23-10:08:30 | Completed 1 iteration(s) in 1444 ms | Average inference time: 1444 ms. [I] onnxrt-runner-N0-12/12/23-10:08:30 | Activating and starting inference [I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider'] [I] onnxrt-runner-N0-12/12/23-10:08:30 ---- Inference Input(s) ---- {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]} [I] onnxrt-runner-N0-12/12/23-10:08:30 ---- Inference Output(s) ---- {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]} [I] onnxrt-runner-N0-12/12/23-10:08:30 | Completed 1 iteration(s) in 4.776 ms | Average inference time: 4.776 ms. [I] Accuracy Comparison | trt-runner-N0-12/12/23-10:08:30 vs. onnxrt-runner-N0-12/12/23-10:08:30 [I] Comparing Output: '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) [I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N0-12/12/23-10:08:30: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18074, min=-0.27846 at (0, 1, 0, 28), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-0.278 , -0.0407) | 94489 | ######################################## (-0.0407, 0.197 ) | 23527 | ######### (0.197 , 0.435 ) | 12451 | ##### (0.435 , 0.672 ) | 4480 | # (0.672 , 0.91 ) | 1967 | (0.91 , 1.15 ) | 926 | (1.15 , 1.39 ) | 317 | (1.39 , 1.62 ) | 72 | (1.62 , 1.86 ) | 10 | (1.86 , 2.1 ) | 1 | [I] onnxrt-runner-N0-12/12/23-10:08:30: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18073, min=-0.27846 at (0, 10, 3, 124), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-0.278 , -0.0407) | 94489 | ######################################## (-0.0407, 0.197 ) | 23527 | ######### (0.197 , 0.435 ) | 12451 | ##### (0.435 , 0.672 ) | 4480 | # (0.672 , 0.91 ) | 1967 | (0.91 , 1.15 ) | 926 | (1.15 , 1.39 ) | 317 | (1.39 , 1.62 ) | 72 | (1.62 , 1.86 ) | 10 | (1.86 , 2.1 ) | 1 | [I] Error Metrics: /model.4/m.1/cv1/act/Mul_output_0 [I] Minimum Required Tolerance: elemwise error | [abs=1.6212e-05] OR [rel=0.20325] (requirements may be lower if both abs/rel tolerances are set) [I] Absolute Difference | Stats: mean=1.4257e-06, std-dev=1.5692e-06, var=2.4625e-12, median=8.9407e-07, min=0 at (0, 0, 1, 42), max=1.6212e-05 at (0, 27, 21, 73), avg-magnitude=1.4257e-06 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 1.62e-06) | 97545 | ######################################## (1.62e-06, 3.24e-06) | 26306 | ########## (3.24e-06, 4.86e-06) | 7343 | ### (4.86e-06, 6.48e-06) | 3949 | # (6.48e-06, 8.11e-06) | 2447 | # (8.11e-06, 9.73e-06) | 631 | (9.73e-06, 1.13e-05) | 17 | (1.13e-05, 1.3e-05 ) | 0 | (1.3e-05 , 1.46e-05) | 1 | (1.46e-05, 1.62e-05) | 1 | [I] Relative Difference | Stats: mean=2.5755e-05, std-dev=0.00073652, var=5.4246e-07, median=4.5301e-06, min=0 at (0, 0, 1, 42), max=0.20325 at (0, 10, 8, 19), avg-magnitude=2.5755e-05 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.0203) | 138226 | ######################################## (0.0203, 0.0406) | 5 | (0.0406, 0.061 ) | 6 | (0.061 , 0.0813) | 1 | (0.0813, 0.102 ) | 1 | (0.102 , 0.122 ) | 0 | (0.122 , 0.142 ) | 0 | (0.142 , 0.163 ) | 0 | (0.163 , 0.183 ) | 0 | (0.183 , 0.203 ) | 1 | [E] FAILED | Output: '/model.4/m.1/cv1/act/Mul_output_0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05) [E] FAILED | Mismatched outputs: ['/model.4/m.1/cv1/act/Mul_output_0'] [E] Accuracy Summary | trt-runner-N0-12/12/23-10:08:30 vs. onnxrt-runner-N0-12/12/23-10:08:30 | Passed: 0/1 iterations | Pass Rate: 0.0% [E] FAILED | Runtime: 58.352s | Command: /root/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt

Model

initial_reduced.zip (100.1 KB)

EDIT

ONNX graph visualization:

AastaLLL · December 13, 2023, 3:33am

Hi,

Please help to test this with our latest JetPack 6 + TensorRT 8.6 software release.
You might need to build ONNXRuntime from the source:

Thanks.

AastaLLL · December 14, 2023, 9:31am

Hi,

This issue doesn’t occur on TensorRT 8.6
We can get “All outputs matched” with Polygraphy on JetPack 6 DP.

Please give it a try:

$ ./TensorRT/tools/Polygraphy/bin/polygraphy run initial_reduced.onnx --trt --onnxrt
[I] RUNNING | Command: ./TensorRT/tools/Polygraphy/bin/polygraphy run initial_reduced.onnx --trt --onnxrt
[I] trt-runner-N0-12/14/23-09:24:33     | Activating and starting inference
[W] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     Configuring with profiles: [Profile().add('/model.4/cv1/conv/Conv_output_0', min=[1, 64, 24, 180], opt=[1, 64, 24, 180], max=[1, 64, 24, 180])]
[I] Building engine with configuration:
    Flags                  | []
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 15656.07 MiB, TACTIC_DRAM: 13765.00 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[I] Finished engine building in 56.670 seconds
[I] trt-runner-N0-12/14/23-09:24:33    
    ---- Inference Input(s) ----
    {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]}
[I] trt-runner-N0-12/14/23-09:24:33    
    ---- Inference Output(s) ----
    {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]}
[I] trt-runner-N0-12/14/23-09:24:33     | Completed 1 iteration(s) in 3.635 ms | Average inference time: 3.635 ms.
[I] onnxrt-runner-N0-12/14/23-09:24:33  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-12/14/23-09:24:33 
    ---- Inference Input(s) ----
    {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]}
[I] onnxrt-runner-N0-12/14/23-09:24:33 
    ---- Inference Output(s) ----
    {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]}
[I] onnxrt-runner-N0-12/14/23-09:24:33  | Completed 1 iteration(s) in 3.157 ms | Average inference time: 3.157 ms.
[I] Accuracy Comparison | trt-runner-N0-12/14/23-09:24:33 vs. onnxrt-runner-N0-12/14/23-09:24:33
[I]     Comparing Output: '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-12/14/23-09:24:33: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18074, min=-0.27846 at (0, 1, 3, 22), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249
[I]         onnxrt-runner-N0-12/14/23-09:24:33: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18073, min=-0.27846 at (0, 10, 3, 124), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249
[I]         Error Metrics: /model.4/m.1/cv1/act/Mul_output_0
[I]             Minimum Required Tolerance: elemwise error | [abs=1.4305e-06] OR [rel=0.047872] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=9.4814e-08, std-dev=1.1095e-07, var=1.231e-14, median=5.9605e-08, min=0 at (0, 0, 0, 2), max=1.4305e-06 at (0, 15, 18, 108), avg-magnitude=9.4814e-08
[I]             Relative Difference | Stats: mean=2.9185e-06, std-dev=0.00016465, var=2.7108e-08, median=2.752e-07, min=0 at (0, 0, 0, 2), max=0.047872 at (0, 21, 6, 16), avg-magnitude=2.9185e-06
[I]         PASSED | Output: '/model.4/m.1/cv1/act/Mul_output_0' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['/model.4/m.1/cv1/act/Mul_output_0']
[I] Accuracy Summary | trt-runner-N0-12/14/23-09:24:33 vs. onnxrt-runner-N0-12/14/23-09:24:33 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 62.945s | Command: ./TensorRT/tools/Polygraphy/bin/polygraphy run initial_reduced.onnx --trt --onnxrt

Here is the onnxruntime 1.16.3 prebuilt for Python 3.10 + JetPack 6 DP for your reference:
onnxruntime_gpu-1.16.3-cp310-cp310-linux_aarch64.whl (55.4 MB)

Thanks.

alon2 · December 17, 2023, 8:14pm

Hey there, thanks for the testing! We will be able test on Jetpack 6 in the future.
But, today, we have production systems that use JetPack 5.1.2. Moreover, as for my understand Jetpack 6 is still in Preview and not production release.

How can we upgrade the TensorRT for those system without update the entire OS?
Can TensorRT 8.6 runs on the same CUDA as in Jetpack 5.1.2?

Thanks!

AastaLLL · December 18, 2023, 7:21am

Hi,

Unfortunately, you will need to upgrade to rel-36 & CUDA 12 to run the TensorRT 8.6.
The GA version should be released early next year:

Thanks.

alon2 · December 18, 2023, 8:22am

Hey, thanks for the quick response.
I understand that upgrade CUDA does not depends anymore on the Jetpack, isnt?

Can you suggest a workaround can we do these days with the production systems that use Jetpack 5.1.2 ?
Thanks

AastaLLL · December 19, 2023, 2:57am

Hi,

On JetPack 5, only CUDA is upgradable.
cuDNN and TensorRT are not.

We need to check with the internal team to see if any workaround that can be applied to the TensorRT 8.5.
Thanks.

alon2 · December 31, 2023, 8:28am

Hey, any news regarding the workaround?
thanks

AastaLLL · January 2, 2024, 6:29am

Hi,

This issue doesn’t have a known WAR currently.
It’s possible for you to wait for our next JetPack 6 GA release?

Thanks.

alon2 · January 7, 2024, 8:57am

Despite my desire, it’s simply not feasible. We begin shipping our systems to customers this month. Even if JetPack 6 were in production, it wouldn’t align with our strict timeline.

AastaLLL · January 8, 2024, 9:12am

Hi,

When running polygraphy with --fail-fast, the difference comes from a convolution layer.

[I]     Comparing Output: '/model.4/m.0/cv2/conv/Conv_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.0/cv2/conv/Conv_output_0' (dtype=float32, shape=(1, 32, 24, 180))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-01/08/24-08:58:06: /model.4/m.0/cv2/conv/Conv_output_0 | Stats: mean=0.19992, std-dev=1.1593, var=1.3439, median=0.2368, min=-4.9056 at (0, 19, 20, 95), max=3.592 at (0, 13, 8, 90), avg-magnitude=0.94605
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-4.91 , -4.06 ) |          5 | 
                (-4.06 , -3.21 ) |        204 | 
                (-3.21 , -2.36 ) |       2345 | ##
                (-2.36 , -1.51 ) |       8958 | #########
                (-1.51 , -0.657) |      19078 | ####################
                (-0.657, 0.193 ) |      36379 | #######################################
                (0.193 , 1.04  ) |      37135 | ########################################
                (1.04  , 1.89  ) |      25269 | ###########################
                (1.89  , 2.74  ) |       7833 | ########
                (2.74  , 3.59  ) |       1034 | #
[I]         onnxrt-runner-N0-01/08/24-08:58:06: /model.4/m.0/cv2/conv/Conv_output_0 | Stats: mean=0.19992, std-dev=1.1593, var=1.3439, median=0.2368, min=-4.9056 at (0, 19, 20, 95), max=3.592 at (0, 13, 8, 90), avg-magnitude=0.94605
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-4.91 , -4.06 ) |          5 | 
                (-4.06 , -3.21 ) |        204 | 
                (-3.21 , -2.36 ) |       2345 | ##
                (-2.36 , -1.51 ) |       8959 | #########
                (-1.51 , -0.657) |      19077 | ####################
                (-0.657, 0.193 ) |      36379 | #######################################
                (0.193 , 1.04  ) |      37135 | ########################################
                (1.04  , 1.89  ) |      25269 | ###########################
                (1.89  , 2.74  ) |       7833 | ########
                (2.74  , 3.59  ) |       1034 | #
[I]         Error Metrics: /model.4/m.0/cv2/conv/Conv_output_0
[I]             Minimum Required Tolerance: elemwise error | [abs=2.2829e-05] OR [rel=0.53521] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=3.0769e-06, std-dev=2.556e-06, var=6.5334e-12, median=2.4438e-06, min=0 at (0, 0, 0, 23), max=2.2829e-05 at (0, 19, 12, 100), avg-magnitude=3.0769e-06
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (0       , 2.28e-06) |      64846 | ########################################
                    (2.28e-06, 4.57e-06) |      42432 | ##########################
                    (4.57e-06, 6.85e-06) |      19275 | ###########
                    (6.85e-06, 9.13e-06) |       7469 | ####
                    (9.13e-06, 1.14e-05) |       2632 | #
                    (1.14e-05, 1.37e-05) |       1049 | 
                    (1.37e-05, 1.6e-05 ) |        369 | 
                    (1.6e-05 , 1.83e-05) |        125 | 
                    (1.83e-05, 2.05e-05) |         33 | 
                    (2.05e-05, 2.28e-05) |         10 | 
[I]             Relative Difference | Stats: mean=3.4775e-05, std-dev=0.0017934, var=3.2163e-06, median=3.093e-06, min=0 at (0, 0, 0, 23), max=0.53521 at (0, 10, 13, 103), avg-magnitude=3.4775e-05
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (0     , 0.0535) |     138227 | ########################################
                    (0.0535, 0.107 ) |          8 | 
                    (0.107 , 0.161 ) |          2 | 
                    (0.161 , 0.214 ) |          2 | 
                    (0.214 , 0.268 ) |          0 | 
                    (0.268 , 0.321 ) |          0 | 
                    (0.321 , 0.375 ) |          0 | 
                    (0.375 , 0.428 ) |          0 | 
                    (0.428 , 0.482 ) |          0 | 
                    (0.482 , 0.535 ) |          1 | 
[E]         FAILED | Output: '/model.4/m.0/cv2/conv/Conv_output_0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E] FAILED | Runtime: 21.838s | Command: /home/nvidia/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt --onnxrt --trt-outputs mark all --onnx-outputs mark all --fail-fast

We are double-checking if any fixed Conv issue in TensorRT 8.5.
Will update more info with you soon.

Thanks.

AastaLLL · January 15, 2024, 4:50am

Hi,

Unfortunately, we don’t get the resources to check this issue on TensorRT 8.5.
Since this issue is not observed in the latest software release, the priority is relatively lower.

Thanks.

Topic		Replies	Views
Skipping tactic 0x0000000000000000 due to Myelin error: Platform (Cuda) error Jetson Orin NX tensorrt	25	2404	January 25, 2023
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5469	June 29, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	5	1440	January 23, 2023
TensorRT gives different results on Jetson Orin Jetson AGX Orin tensorrt , nvbugs	6	837	June 5, 2023
TensorRT problem on NVIDIA APEX ORIN NX TensorRT tensorrt , jetson-inference , cudnn	1	51	August 29, 2024
Onnx to torchrt convertion error TensorRT tensorrt , cuda , onnx , tf-trt , jetson	3	1118	February 8, 2022
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1655	September 28, 2023
Onnx and trt output has a large gap Jetson AGX Xavier tensorrt , onnx	7	102	July 17, 2024
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	799	April 30, 2024
Onnx output differs largely to TRT engine output TensorRT	14	1824	February 25, 2023

Onnx to TensorRT mismatch

Extra information:

ploygraphy log

Model

EDIT

ONNX graph visualization:

Related topics