Onnx to TensorRT mismatch

We are facing a challenge with TensorRT on the NVIDIA Orin NX platform. Our team has encountered an output mismatch issue when converting models from ONNX to TensorRT, despite using TRT 32-bit where we don’t anticipate accuracy discrepancies.

Here are the details of our situation:

  • Hardware: NVIDIA Orin Dev kit, running as Orin NX 16GB
    image

  • Software: We are utilizing the latest compatible versions of ONNX and TensorRT.
    image

image

  • Issue Description: Post conversion of our model from ONNX to TensorRT, we observed a mismatch in the outputs. This is particularly concerning as it directly impacts the accuracy and reliability of our model’s predictions.

  • Steps Taken: To narrow down the issue, we employed Polygraphy and successfully pinpointed the minimal graph responsible for the problem. Additionally, we have been using TRT 32-bit in our process, under the assumption that it would mitigate any potential accuracy issues typically associated with the conversion from ONNX to TensorRT.

  • Reproduce: run polygraphy run initial_reduced.onnx --trt --onnxrt with the attached model

Our objective is to achieve a precise and consistent conversion of our models from ONNX to TensorRT, without facing output accuracy issues. I am seeking insights, advice, or similar experiences from the community regarding this matter.

If anyone has faced a similar situation or has suggestions on troubleshooting methods, configuration adjustments, or updates that might aid in resolving this issue, your input would be highly valued.

Thanks!

Extra information:

ploygraphy log

>polygraphy run initial_reduced.onnx --trt --onnxrt [I] RUNNING | Command: /root/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt [I] trt-runner-N0-12/12/23-10:08:30 | Activating and starting inference [W] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [I] Configuring with profiles:[ Profile 0: {/model.4/cv1/conv/Conv_output_0 [min=[1, 64, 24, 180], opt=[1, 64, 24, 180], max=[1, 64, 24, 180]]} ] [I] Building engine with configuration: Flags | [] Engine Capability | EngineCapability.DEFAULT Memory Pools | [WORKSPACE: 15824.66 MiB] Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS] Profiling Verbosity | ProfilingVerbosity.DETAILED [I] Finished engine building in 51.652 seconds [I] trt-runner-N0-12/12/23-10:08:30 ---- Inference Input(s) ---- {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]} [I] trt-runner-N0-12/12/23-10:08:30 ---- Inference Output(s) ---- {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]} [I] trt-runner-N0-12/12/23-10:08:30 | Completed 1 iteration(s) in 1444 ms | Average inference time: 1444 ms. [I] onnxrt-runner-N0-12/12/23-10:08:30 | Activating and starting inference [I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider'] [I] onnxrt-runner-N0-12/12/23-10:08:30 ---- Inference Input(s) ---- {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]} [I] onnxrt-runner-N0-12/12/23-10:08:30 ---- Inference Output(s) ---- {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]} [I] onnxrt-runner-N0-12/12/23-10:08:30 | Completed 1 iteration(s) in 4.776 ms | Average inference time: 4.776 ms. [I] Accuracy Comparison | trt-runner-N0-12/12/23-10:08:30 vs. onnxrt-runner-N0-12/12/23-10:08:30 [I] Comparing Output: '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) [I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N0-12/12/23-10:08:30: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18074, min=-0.27846 at (0, 1, 0, 28), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-0.278 , -0.0407) | 94489 | ######################################## (-0.0407, 0.197 ) | 23527 | ######### (0.197 , 0.435 ) | 12451 | ##### (0.435 , 0.672 ) | 4480 | # (0.672 , 0.91 ) | 1967 | (0.91 , 1.15 ) | 926 | (1.15 , 1.39 ) | 317 | (1.39 , 1.62 ) | 72 | (1.62 , 1.86 ) | 10 | (1.86 , 2.1 ) | 1 | [I] onnxrt-runner-N0-12/12/23-10:08:30: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18073, min=-0.27846 at (0, 10, 3, 124), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-0.278 , -0.0407) | 94489 | ######################################## (-0.0407, 0.197 ) | 23527 | ######### (0.197 , 0.435 ) | 12451 | ##### (0.435 , 0.672 ) | 4480 | # (0.672 , 0.91 ) | 1967 | (0.91 , 1.15 ) | 926 | (1.15 , 1.39 ) | 317 | (1.39 , 1.62 ) | 72 | (1.62 , 1.86 ) | 10 | (1.86 , 2.1 ) | 1 | [I] Error Metrics: /model.4/m.1/cv1/act/Mul_output_0 [I] Minimum Required Tolerance: elemwise error | [abs=1.6212e-05] OR [rel=0.20325] (requirements may be lower if both abs/rel tolerances are set) [I] Absolute Difference | Stats: mean=1.4257e-06, std-dev=1.5692e-06, var=2.4625e-12, median=8.9407e-07, min=0 at (0, 0, 1, 42), max=1.6212e-05 at (0, 27, 21, 73), avg-magnitude=1.4257e-06 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 1.62e-06) | 97545 | ######################################## (1.62e-06, 3.24e-06) | 26306 | ########## (3.24e-06, 4.86e-06) | 7343 | ### (4.86e-06, 6.48e-06) | 3949 | # (6.48e-06, 8.11e-06) | 2447 | # (8.11e-06, 9.73e-06) | 631 | (9.73e-06, 1.13e-05) | 17 | (1.13e-05, 1.3e-05 ) | 0 | (1.3e-05 , 1.46e-05) | 1 | (1.46e-05, 1.62e-05) | 1 | [I] Relative Difference | Stats: mean=2.5755e-05, std-dev=0.00073652, var=5.4246e-07, median=4.5301e-06, min=0 at (0, 0, 1, 42), max=0.20325 at (0, 10, 8, 19), avg-magnitude=2.5755e-05 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.0203) | 138226 | ######################################## (0.0203, 0.0406) | 5 | (0.0406, 0.061 ) | 6 | (0.061 , 0.0813) | 1 | (0.0813, 0.102 ) | 1 | (0.102 , 0.122 ) | 0 | (0.122 , 0.142 ) | 0 | (0.142 , 0.163 ) | 0 | (0.163 , 0.183 ) | 0 | (0.183 , 0.203 ) | 1 | [E] FAILED | Output: '/model.4/m.1/cv1/act/Mul_output_0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05) [E] FAILED | Mismatched outputs: ['/model.4/m.1/cv1/act/Mul_output_0'] [E] Accuracy Summary | trt-runner-N0-12/12/23-10:08:30 vs. onnxrt-runner-N0-12/12/23-10:08:30 | Passed: 0/1 iterations | Pass Rate: 0.0% [E] FAILED | Runtime: 58.352s | Command: /root/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt

Model

initial_reduced.zip (100.1 KB)

EDIT

ONNX graph visualization:

Hi,

Please help to test this with our latest JetPack 6 + TensorRT 8.6 software release.
You might need to build ONNXRuntime from the source:

Thanks.

Hi,

This issue doesn’t occur on TensorRT 8.6
We can get “All outputs matched” with Polygraphy on JetPack 6 DP.

Please give it a try:

$ ./TensorRT/tools/Polygraphy/bin/polygraphy run initial_reduced.onnx --trt --onnxrt
[I] RUNNING | Command: ./TensorRT/tools/Polygraphy/bin/polygraphy run initial_reduced.onnx --trt --onnxrt
[I] trt-runner-N0-12/14/23-09:24:33     | Activating and starting inference
[W] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     Configuring with profiles: [Profile().add('/model.4/cv1/conv/Conv_output_0', min=[1, 64, 24, 180], opt=[1, 64, 24, 180], max=[1, 64, 24, 180])]
[I] Building engine with configuration:
    Flags                  | []
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 15656.07 MiB, TACTIC_DRAM: 13765.00 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[I] Finished engine building in 56.670 seconds
[I] trt-runner-N0-12/14/23-09:24:33    
    ---- Inference Input(s) ----
    {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]}
[I] trt-runner-N0-12/14/23-09:24:33    
    ---- Inference Output(s) ----
    {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]}
[I] trt-runner-N0-12/14/23-09:24:33     | Completed 1 iteration(s) in 3.635 ms | Average inference time: 3.635 ms.
[I] onnxrt-runner-N0-12/14/23-09:24:33  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-12/14/23-09:24:33 
    ---- Inference Input(s) ----
    {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]}
[I] onnxrt-runner-N0-12/14/23-09:24:33 
    ---- Inference Output(s) ----
    {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]}
[I] onnxrt-runner-N0-12/14/23-09:24:33  | Completed 1 iteration(s) in 3.157 ms | Average inference time: 3.157 ms.
[I] Accuracy Comparison | trt-runner-N0-12/14/23-09:24:33 vs. onnxrt-runner-N0-12/14/23-09:24:33
[I]     Comparing Output: '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-12/14/23-09:24:33: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18074, min=-0.27846 at (0, 1, 3, 22), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249
[I]         onnxrt-runner-N0-12/14/23-09:24:33: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18073, min=-0.27846 at (0, 10, 3, 124), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249
[I]         Error Metrics: /model.4/m.1/cv1/act/Mul_output_0
[I]             Minimum Required Tolerance: elemwise error | [abs=1.4305e-06] OR [rel=0.047872] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=9.4814e-08, std-dev=1.1095e-07, var=1.231e-14, median=5.9605e-08, min=0 at (0, 0, 0, 2), max=1.4305e-06 at (0, 15, 18, 108), avg-magnitude=9.4814e-08
[I]             Relative Difference | Stats: mean=2.9185e-06, std-dev=0.00016465, var=2.7108e-08, median=2.752e-07, min=0 at (0, 0, 0, 2), max=0.047872 at (0, 21, 6, 16), avg-magnitude=2.9185e-06
[I]         PASSED | Output: '/model.4/m.1/cv1/act/Mul_output_0' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['/model.4/m.1/cv1/act/Mul_output_0']
[I] Accuracy Summary | trt-runner-N0-12/14/23-09:24:33 vs. onnxrt-runner-N0-12/14/23-09:24:33 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 62.945s | Command: ./TensorRT/tools/Polygraphy/bin/polygraphy run initial_reduced.onnx --trt --onnxrt

Here is the onnxruntime 1.16.3 prebuilt for Python 3.10 + JetPack 6 DP for your reference:
onnxruntime_gpu-1.16.3-cp310-cp310-linux_aarch64.whl (55.4 MB)

Thanks.

Hey there, thanks for the testing! We will be able test on Jetpack 6 in the future.
But, today, we have production systems that use JetPack 5.1.2. Moreover, as for my understand Jetpack 6 is still in Preview and not production release.

  1. How can we upgrade the TensorRT for those system without update the entire OS?
  2. Can TensorRT 8.6 runs on the same CUDA as in Jetpack 5.1.2?

Thanks!

Hi,

Unfortunately, you will need to upgrade to rel-36 & CUDA 12 to run the TensorRT 8.6.
The GA version should be released early next year:

Thanks.

Hey, thanks for the quick response.
I understand that upgrade CUDA does not depends anymore on the Jetpack, isnt?


Can you suggest a workaround can we do these days with the production systems that use Jetpack 5.1.2 ?
Thanks

Hi,

On JetPack 5, only CUDA is upgradable.
cuDNN and TensorRT are not.

We need to check with the internal team to see if any workaround that can be applied to the TensorRT 8.5.
Thanks.

1 Like

Hey, any news regarding the workaround?
thanks

Hi,

This issue doesn’t have a known WAR currently.
It’s possible for you to wait for our next JetPack 6 GA release?

Thanks.

Despite my desire, it’s simply not feasible. We begin shipping our systems to customers this month. Even if JetPack 6 were in production, it wouldn’t align with our strict timeline.

Hi,

When running polygraphy with --fail-fast, the difference comes from a convolution layer.

[I]     Comparing Output: '/model.4/m.0/cv2/conv/Conv_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.0/cv2/conv/Conv_output_0' (dtype=float32, shape=(1, 32, 24, 180))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-01/08/24-08:58:06: /model.4/m.0/cv2/conv/Conv_output_0 | Stats: mean=0.19992, std-dev=1.1593, var=1.3439, median=0.2368, min=-4.9056 at (0, 19, 20, 95), max=3.592 at (0, 13, 8, 90), avg-magnitude=0.94605
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-4.91 , -4.06 ) |          5 | 
                (-4.06 , -3.21 ) |        204 | 
                (-3.21 , -2.36 ) |       2345 | ##
                (-2.36 , -1.51 ) |       8958 | #########
                (-1.51 , -0.657) |      19078 | ####################
                (-0.657, 0.193 ) |      36379 | #######################################
                (0.193 , 1.04  ) |      37135 | ########################################
                (1.04  , 1.89  ) |      25269 | ###########################
                (1.89  , 2.74  ) |       7833 | ########
                (2.74  , 3.59  ) |       1034 | #
[I]         onnxrt-runner-N0-01/08/24-08:58:06: /model.4/m.0/cv2/conv/Conv_output_0 | Stats: mean=0.19992, std-dev=1.1593, var=1.3439, median=0.2368, min=-4.9056 at (0, 19, 20, 95), max=3.592 at (0, 13, 8, 90), avg-magnitude=0.94605
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-4.91 , -4.06 ) |          5 | 
                (-4.06 , -3.21 ) |        204 | 
                (-3.21 , -2.36 ) |       2345 | ##
                (-2.36 , -1.51 ) |       8959 | #########
                (-1.51 , -0.657) |      19077 | ####################
                (-0.657, 0.193 ) |      36379 | #######################################
                (0.193 , 1.04  ) |      37135 | ########################################
                (1.04  , 1.89  ) |      25269 | ###########################
                (1.89  , 2.74  ) |       7833 | ########
                (2.74  , 3.59  ) |       1034 | #
[I]         Error Metrics: /model.4/m.0/cv2/conv/Conv_output_0
[I]             Minimum Required Tolerance: elemwise error | [abs=2.2829e-05] OR [rel=0.53521] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=3.0769e-06, std-dev=2.556e-06, var=6.5334e-12, median=2.4438e-06, min=0 at (0, 0, 0, 23), max=2.2829e-05 at (0, 19, 12, 100), avg-magnitude=3.0769e-06
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (0       , 2.28e-06) |      64846 | ########################################
                    (2.28e-06, 4.57e-06) |      42432 | ##########################
                    (4.57e-06, 6.85e-06) |      19275 | ###########
                    (6.85e-06, 9.13e-06) |       7469 | ####
                    (9.13e-06, 1.14e-05) |       2632 | #
                    (1.14e-05, 1.37e-05) |       1049 | 
                    (1.37e-05, 1.6e-05 ) |        369 | 
                    (1.6e-05 , 1.83e-05) |        125 | 
                    (1.83e-05, 2.05e-05) |         33 | 
                    (2.05e-05, 2.28e-05) |         10 | 
[I]             Relative Difference | Stats: mean=3.4775e-05, std-dev=0.0017934, var=3.2163e-06, median=3.093e-06, min=0 at (0, 0, 0, 23), max=0.53521 at (0, 10, 13, 103), avg-magnitude=3.4775e-05
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (0     , 0.0535) |     138227 | ########################################
                    (0.0535, 0.107 ) |          8 | 
                    (0.107 , 0.161 ) |          2 | 
                    (0.161 , 0.214 ) |          2 | 
                    (0.214 , 0.268 ) |          0 | 
                    (0.268 , 0.321 ) |          0 | 
                    (0.321 , 0.375 ) |          0 | 
                    (0.375 , 0.428 ) |          0 | 
                    (0.428 , 0.482 ) |          0 | 
                    (0.482 , 0.535 ) |          1 | 
[E]         FAILED | Output: '/model.4/m.0/cv2/conv/Conv_output_0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E] FAILED | Runtime: 21.838s | Command: /home/nvidia/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt --onnxrt --trt-outputs mark all --onnx-outputs mark all --fail-fast

We are double-checking if any fixed Conv issue in TensorRT 8.5.
Will update more info with you soon.

Thanks.

Hi,

Unfortunately, we don’t get the resources to check this issue on TensorRT 8.5.
Since this issue is not observed in the latest software release, the priority is relatively lower.

Thanks.