Description
When i use Polygraphy to compare the accuracy between trt and onnx, there is a weird accuracy lost. I don’t choose either fp16 or int8.
I want to know how to figure it out and why it occurs?
Environment
TensorRT Version:8.2.2.1
GPU Type: Titan RTX
Nvidia Driver Version: 470.42.01
CUDA Version: 11.4
CUDNN Version: 8.2.1.32-1
Operating System + Version: ubuntu1804
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag): none
Relevant Files
here is my model:
Steps To Reproduce
Here is my command line:
/path/to/anaconda3/envs/deployment/bin/polygraphy run /path/to/deployment/models/myhr
net154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000
and the output as belows:
[I] trt-runner-N0-01/18/22-20:00:38 | Activating and starting inference
[I] Configuring with profiles: [Profile().add(input.1, min=[2, 3, 384, 288], opt=[2, 3, 384, 288], max=[2, 3, 384, 288])]
[I] Building engine with configuration:
Workspace | 6000000000 bytes (5722.05 MiB)
Precision | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
Safety Restricted | False
Profiles | 1 profile(s)
[01/18/2022-20:00:45] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[01/18/2022-20:01:27] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] Finished engine building in 47.574 seconds
[01/18/2022-20:01:28] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[01/18/2022-20:01:28] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] trt-runner-N0-01/18/22-20:00:38
---- Inference Input(s) ----
{input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] trt-runner-N0-01/18/22-20:00:38
---- Inference Output(s) ----
{2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] trt-runner-N0-01/18/22-20:00:38 | Completed 1 iteration(s) in 5.536 ms | Average inference time: 5.536 ms.
[I] onnxrt-runner-N0-01/18/22-20:00:38 | Activating and starting inference
[I] onnxrt-runner-N0-01/18/22-20:00:38
---- Inference Input(s) ----
{input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] onnxrt-runner-N0-01/18/22-20:00:38
---- Inference Output(s) ----
{2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] onnxrt-runner-N0-01/18/22-20:00:38 | Completed 1 iteration(s) in 154.6 ms | Average inference time: 154.6 ms.
[I] Accuracy Comparison | trt-runner-N0-01/18/22-20:00:38 vs. onnxrt-runner-N0-01/18/22-20:00:38
[I] Comparing Output: '2947' (dtype=float32, shape=(2, 17, 96, 72)) with '2947' (dtype=float32, shape=(2, 17, 96, 72)) | Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I] trt-runner-N0-01/18/22-20:00:38: 2947 | Stats: mean=55.534, std-dev=9.8689, var=97.396, median=54.608, min=16.535 at (0, 13, 0, 2), max=87.623 at (1, 4, 46, 68), avg-magnitude=55.534
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(-2.97e-05, 8.76) | 0 |
(8.76 , 17.5) | 4 |
(17.5 , 26.3) | 52 |
(26.3 , 35 ) | 1204 |
(35 , 43.8) | 26644 | ##############
(43.8 , 52.6) | 73105 | ########################################
(52.6 , 61.3) | 62114 | #################################
(61.3 , 70.1) | 53612 | #############################
(70.1 , 78.9) | 16940 | #########
(78.9 , 87.6) | 1333 |
[I] onnxrt-runner-N0-01/18/22-20:00:38: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9664e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(-2.97e-05, 8.76) | 235008 | ########################################
(8.76 , 17.5) | 0 |
(17.5 , 26.3) | 0 |
(26.3 , 35 ) | 0 |
(35 , 43.8) | 0 |
(43.8 , 52.6) | 0 |
(52.6 , 61.3) | 0 |
(61.3 , 70.1) | 0 |
(70.1 , 78.9) | 0 |
(78.9 , 87.6) | 0 |
[I] Error Metrics: 2947
[I] Minimum Required Tolerance: elemwise error | [abs=87.622] OR [rel=1.6141e+07]
[I] Absolute Difference | Stats: mean=55.53, std-dev=9.8696, var=97.409, median=54.604, min=16.535 at (0, 13, 0, 2), max=87.622 at (1, 4, 46, 68), avg-magnitude=55.53
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(16.5, 23.6) | 21 |
(23.6, 30.8) | 215 |
(30.8, 37.9) | 3740 | ##
(37.9, 45 ) | 32283 | #####################
(45 , 52.1) | 60502 | ########################################
(52.1, 59.2) | 52110 | ##################################
(59.2, 66.3) | 47035 | ###############################
(66.3, 73.4) | 31473 | ####################
(73.4, 80.5) | 6917 | ####
(80.5, 87.6) | 712 |
[I] Relative Difference | Stats: mean=99830, std-dev=2.9623e+05, var=8.7751e+10, median=28881, min=577.86 at (1, 2, 18, 71), max=1.6141e+07 at (0, 11, 6, 71), avg-magnitude=99830
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(578 , 1.61e+06) | 231870 | ########################################
(1.61e+06, 3.23e+06) | 3070 |
(3.23e+06, 4.84e+06) | 52 |
(4.84e+06, 6.46e+06) | 4 |
(6.46e+06, 8.07e+06) | 6 |
(8.07e+06, 9.69e+06) | 2 |
(9.69e+06, 1.13e+07) | 0 |
(1.13e+07, 1.29e+07) | 1 |
(1.29e+07, 1.45e+07) | 1 |
(1.45e+07, 1.61e+07) | 2 |
[E] FAILED | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E] FAILED | Mismatched outputs: ['2947']
[!] FAILED | Command: /nvme/chenjinwei/anaconda3/envs/deployment/bin/polygraphy run /nvme/chenjinwei/deployment/models/myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000
Thanks!