We are facing a challenge with TensorRT on the NVIDIA Orin NX platform. Our team has encountered an output mismatch issue when converting models from ONNX to TensorRT, despite using TRT 32-bit where we don’t anticipate accuracy discrepancies.
Here are the details of our situation:
-
Hardware: NVIDIA Orin Dev kit, running as Orin NX 16GB
-
Software: We are utilizing the latest compatible versions of ONNX and TensorRT.
-
Issue Description: Post conversion of our model from ONNX to TensorRT, we observed a mismatch in the outputs. This is particularly concerning as it directly impacts the accuracy and reliability of our model’s predictions.
-
Steps Taken: To narrow down the issue, we employed Polygraphy and successfully pinpointed the minimal graph responsible for the problem. Additionally, we have been using TRT 32-bit in our process, under the assumption that it would mitigate any potential accuracy issues typically associated with the conversion from ONNX to TensorRT.
-
Reproduce: run
polygraphy run initial_reduced.onnx --trt --onnxrt
with the attached model
Our objective is to achieve a precise and consistent conversion of our models from ONNX to TensorRT, without facing output accuracy issues. I am seeking insights, advice, or similar experiences from the community regarding this matter.
If anyone has faced a similar situation or has suggestions on troubleshooting methods, configuration adjustments, or updates that might aid in resolving this issue, your input would be highly valued.
Thanks!
Extra information:
ploygraphy log
>polygraphy run initial_reduced.onnx --trt --onnxrt [I] RUNNING | Command: /root/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt [I] trt-runner-N0-12/12/23-10:08:30 | Activating and starting inference [W] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [I] Configuring with profiles:[ Profile 0: {/model.4/cv1/conv/Conv_output_0 [min=[1, 64, 24, 180], opt=[1, 64, 24, 180], max=[1, 64, 24, 180]]} ] [I] Building engine with configuration: Flags | [] Engine Capability | EngineCapability.DEFAULT Memory Pools | [WORKSPACE: 15824.66 MiB] Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS] Profiling Verbosity | ProfilingVerbosity.DETAILED [I] Finished engine building in 51.652 seconds [I] trt-runner-N0-12/12/23-10:08:30 ---- Inference Input(s) ---- {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]} [I] trt-runner-N0-12/12/23-10:08:30 ---- Inference Output(s) ---- {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]} [I] trt-runner-N0-12/12/23-10:08:30 | Completed 1 iteration(s) in 1444 ms | Average inference time: 1444 ms. [I] onnxrt-runner-N0-12/12/23-10:08:30 | Activating and starting inference [I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider'] [I] onnxrt-runner-N0-12/12/23-10:08:30 ---- Inference Input(s) ---- {/model.4/cv1/conv/Conv_output_0 [dtype=float32, shape=(1, 64, 24, 180)]} [I] onnxrt-runner-N0-12/12/23-10:08:30 ---- Inference Output(s) ---- {/model.4/m.1/cv1/act/Mul_output_0 [dtype=float32, shape=(1, 32, 24, 180)]} [I] onnxrt-runner-N0-12/12/23-10:08:30 | Completed 1 iteration(s) in 4.776 ms | Average inference time: 4.776 ms. [I] Accuracy Comparison | trt-runner-N0-12/12/23-10:08:30 vs. onnxrt-runner-N0-12/12/23-10:08:30 [I] Comparing Output: '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) with '/model.4/m.1/cv1/act/Mul_output_0' (dtype=float32, shape=(1, 32, 24, 180)) [I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N0-12/12/23-10:08:30: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18074, min=-0.27846 at (0, 1, 0, 28), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-0.278 , -0.0407) | 94489 | ######################################## (-0.0407, 0.197 ) | 23527 | ######### (0.197 , 0.435 ) | 12451 | ##### (0.435 , 0.672 ) | 4480 | # (0.672 , 0.91 ) | 1967 | (0.91 , 1.15 ) | 926 | (1.15 , 1.39 ) | 317 | (1.39 , 1.62 ) | 72 | (1.62 , 1.86 ) | 10 | (1.86 , 2.1 ) | 1 | [I] onnxrt-runner-N0-12/12/23-10:08:30: /model.4/m.1/cv1/act/Mul_output_0 | Stats: mean=-0.069648, std-dev=0.26339, var=0.069374, median=-0.18073, min=-0.27846 at (0, 10, 3, 124), max=2.0988 at (0, 31, 0, 0), avg-magnitude=0.2249 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-0.278 , -0.0407) | 94489 | ######################################## (-0.0407, 0.197 ) | 23527 | ######### (0.197 , 0.435 ) | 12451 | ##### (0.435 , 0.672 ) | 4480 | # (0.672 , 0.91 ) | 1967 | (0.91 , 1.15 ) | 926 | (1.15 , 1.39 ) | 317 | (1.39 , 1.62 ) | 72 | (1.62 , 1.86 ) | 10 | (1.86 , 2.1 ) | 1 | [I] Error Metrics: /model.4/m.1/cv1/act/Mul_output_0 [I] Minimum Required Tolerance: elemwise error | [abs=1.6212e-05] OR [rel=0.20325] (requirements may be lower if both abs/rel tolerances are set) [I] Absolute Difference | Stats: mean=1.4257e-06, std-dev=1.5692e-06, var=2.4625e-12, median=8.9407e-07, min=0 at (0, 0, 1, 42), max=1.6212e-05 at (0, 27, 21, 73), avg-magnitude=1.4257e-06 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 1.62e-06) | 97545 | ######################################## (1.62e-06, 3.24e-06) | 26306 | ########## (3.24e-06, 4.86e-06) | 7343 | ### (4.86e-06, 6.48e-06) | 3949 | # (6.48e-06, 8.11e-06) | 2447 | # (8.11e-06, 9.73e-06) | 631 | (9.73e-06, 1.13e-05) | 17 | (1.13e-05, 1.3e-05 ) | 0 | (1.3e-05 , 1.46e-05) | 1 | (1.46e-05, 1.62e-05) | 1 | [I] Relative Difference | Stats: mean=2.5755e-05, std-dev=0.00073652, var=5.4246e-07, median=4.5301e-06, min=0 at (0, 0, 1, 42), max=0.20325 at (0, 10, 8, 19), avg-magnitude=2.5755e-05 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.0203) | 138226 | ######################################## (0.0203, 0.0406) | 5 | (0.0406, 0.061 ) | 6 | (0.061 , 0.0813) | 1 | (0.0813, 0.102 ) | 1 | (0.102 , 0.122 ) | 0 | (0.122 , 0.142 ) | 0 | (0.142 , 0.163 ) | 0 | (0.163 , 0.183 ) | 0 | (0.183 , 0.203 ) | 1 | [E] FAILED | Output: '/model.4/m.1/cv1/act/Mul_output_0' | Difference exceeds tolerance (rel=1e-05, abs=1e-05) [E] FAILED | Mismatched outputs: ['/model.4/m.1/cv1/act/Mul_output_0'] [E] Accuracy Summary | trt-runner-N0-12/12/23-10:08:30 vs. onnxrt-runner-N0-12/12/23-10:08:30 | Passed: 0/1 iterations | Pass Rate: 0.0% [E] FAILED | Runtime: 58.352s | Command: /root/.local/bin/polygraphy run initial_reduced.onnx --trt --onnxrt
Model
initial_reduced.zip (100.1 KB)