Description
I am converting a semantic segmentation model using from Onnx to TensorRT format. The model gets converted successfully but the outputs are not within the acceptable atol and rtol of 1e-3. I am using Polygraphy to convert the model and I am using the following command -
polygraphy run model.onnx --trt --onnxrt --precision-constraints obey --save-engine model.engine --providers CUDAExecutionProvider --atol 1e-3 --rtol 1e-3
I will need some help on if there are some customizations I can make to ensure the atol and rtol are within 1e-3 for the TensorRT model. Model accuracy is utmost importance in our case.
Environment
TensorRT Version: 10.8.0.43
GPU Type: GeForce RTX 2060
Nvidia Driver Version: 572.60
CUDA Version: 12.8
CUDNN Version: v9.5
Operating System + Version: Windows 24H2 26100.3476
Python Version (if applicable): 3.12.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
I can’t share the model files here. But I am happy to share the models files privately.
Steps To Reproduce
Please run the polygraphy command mentioned above.
(onnx-opt) C:\Users\msrir\dev\VisionKit\customers\aiv>polygraphy run model.onnx --trt --onnxrt --precision-constraints obey --save-engine model.engine --providers CUDAExecutionProvider --atol 1e-3 --rtol 1e-3
['\x1b[38;5;14m'][I] RUNNING | Command: \\?\C:\Users\msrir\anaconda3\envs\onnx-opt\Scripts\polygraphy run model.onnx --trt --onnxrt --precision-constraints obey --save-engine model.engine --providers CUDAExecutionProvider --atol 1e-3 --rtol 1e-3
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
['\x1b[38;5;14m'][I] trt-runner-N0-03/13/25-15:32:52 | Activating and starting inference
[I] Configuring with profiles:[
Profile 0:
{data [min=[1, 3, 512, 512], opt=[1, 3, 512, 512], max=[1, 3, 512, 512]]}
]
['\x1b[38;5;11m'][W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
['\x1b[38;5;14m'][I] Building engine with configuration:
Flags | [OBEY_PRECISION_CONSTRAINTS]
Engine Capability | EngineCapability.STANDARD
Memory Pools | [WORKSPACE: 6143.69 MiB, TACTIC_DRAM: 6143.69 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
Tactic Sources | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [PROFILE_SHARING_0806]
['\x1b[38;5;10m'][I] Finished engine building in 106.870 seconds
[I] trt-runner-N0-03/13/25-15:32:52
---- Inference Input(s) ----
{data [dtype=float32, shape=(1, 3, 512, 512)]}
[I] trt-runner-N0-03/13/25-15:32:52
---- Inference Output(s) ----
{output [dtype=float32, shape=(1, 3, 512, 512)]}
['\x1b[38;5;10m'][I] trt-runner-N0-03/13/25-15:32:52 | Completed 1 iteration(s) in 2824 ms | Average inference time: 2824 ms.
['\x1b[38;5;14m'][I] onnxrt-runner-N0-03/13/25-15:32:52 | Activating and starting inference
['\x1b[38;5;14m'][I] Creating ONNX-Runtime Inference Session with providers: ['CUDAExecutionProvider']
[I] onnxrt-runner-N0-03/13/25-15:32:52
---- Inference Input(s) ----
{data [dtype=float32, shape=(1, 3, 512, 512)]}
[I] onnxrt-runner-N0-03/13/25-15:32:52
---- Inference Output(s) ----
{output [dtype=float32, shape=(1, 3, 512, 512)]}
['\x1b[38;5;10m'][I] onnxrt-runner-N0-03/13/25-15:32:52 | Completed 1 iteration(s) in 761 ms | Average inference time: 761 ms.
['\x1b[38;5;14m'][I] Accuracy Comparison | trt-runner-N0-03/13/25-15:32:52 vs. onnxrt-runner-N0-03/13/25-15:32:52
['\x1b[38;5;14m'][I] Comparing Output: 'output' (dtype=float32, shape=(1, 3, 512, 512)) with 'output' (dtype=float32, shape=(1, 3, 512, 512))
[I] Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I] trt-runner-N0-03/13/25-15:32:52: output | Stats: mean=0.33328, std-dev=0.47133, var=0.22215, median=8.0377e-07, min=1.84e-07 at (0, 1, 2, 6), max=0.99997 at (0, 0, 4, 6), avg-magnitude=0.33328, p90=0.99997, p95=0.99997, p99=0.99997
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(1.68e-07, 0.1) | 524288 | ########################################
(0.1 , 0.2) | 0 |
(0.2 , 0.3) | 0 |
(0.3 , 0.4) | 0 |
(0.4 , 0.5) | 0 |
(0.5 , 0.6) | 0 |
(0.6 , 0.7) | 0 |
(0.7 , 0.8) | 0 |
(0.8 , 0.9) | 0 |
(0.9 , 1 ) | 262144 | ####################
[I] onnxrt-runner-N0-03/13/25-15:32:52: output | Stats: mean=0.33328, std-dev=0.47133, var=0.22215, median=6.5396e-07, min=1.6816e-07 at (0, 1, 0, 6), max=0.99998 at (0, 0, 4, 5), avg-magnitude=0.33328, p90=0.99998, p95=0.99998, p99=0.99998
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(1.68e-07, 0.1) | 524288 | ########################################
(0.1 , 0.2) | 0 |
(0.2 , 0.3) | 0 |
(0.3 , 0.4) | 0 |
(0.4 , 0.5) | 0 |
(0.5 , 0.6) | 0 |
(0.6 , 0.7) | 0 |
(0.7 , 0.8) | 0 |
(0.8 , 0.9) | 0 |
(0.9 , 1 ) | 262144 | ####################
[I] Error Metrics: output
[I] Minimum Required Tolerance: elemwise error | [abs=0.0016909] OR [rel=0.44752] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=4.8133e-06, std-dev=4.7414e-05, var=2.2481e-09, median=1.4981e-07, min=6.8212e-12 at (0, 1, 0, 433), max=0.0016909 at (0, 0, 0, 422), avg-magnitude=4.8133e-06, p90=7.1526e-06, p95=7.1526e-06, p99=7.5102e-06
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(6.82e-12, 0.000169) | 784397 | ########################################
(0.000169, 0.000338) | 502 |
(0.000338, 0.000507) | 305 |
(0.000507, 0.000676) | 298 |
(0.000676, 0.000845) | 16 |
(0.000845, 0.00101 ) | 56 |
(0.00101 , 0.00118 ) | 208 |
(0.00118 , 0.00135 ) | 334 |
(0.00135 , 0.00152 ) | 178 |
(0.00152 , 0.00169 ) | 138 |
[I] Relative Difference | Stats: mean=0.10717, std-dev=0.093952, var=0.008827, median=0.09343, min=1.193e-07 at (0, 0, 2, 436), max=0.44752 at (0, 1, 510, 510), avg-magnitude=0.10717, p90=0.22908, p95=0.22908, p99=0.22908
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(1.19e-07, 0.0448) | 263497 | ########################################
(0.0448 , 0.0895) | 1455 |
(0.0895 , 0.134 ) | 259559 | #######################################
(0.134 , 0.179 ) | 1692 |
(0.179 , 0.224 ) | 1301 |
(0.224 , 0.269 ) | 258896 | #######################################
(0.269 , 0.313 ) | 6 |
(0.313 , 0.358 ) | 4 |
(0.358 , 0.403 ) | 10 |
(0.403 , 0.448 ) | 12 |
['\x1b[38;5;9m'][E] FAILED | Output: 'output' | Difference exceeds tolerance (rel=0.001, abs=0.001)
['\x1b[38;5;9m'][E] FAILED | Mismatched outputs: ['output']
['\x1b[38;5;9m'][E] Accuracy Summary | trt-runner-N0-03/13/25-15:32:52 vs. onnxrt-runner-N0-03/13/25-15:32:52 | Passed: 0/1 iterations | Pass Rate: 0.0%
['\x1b[38;5;9m'][E] FAILED | Runtime: 126.740s | Command: \\?\C:\Users\msrir\anaconda3\envs\onnx-opt\Scripts\polygraphy run model.onnx --trt --onnxrt --precision-constraints obey --save-engine model.engine --providers CUDAExecutionProvider --atol 1e-3 --rtol 1e-3