Tensorrt loss accuracy when test

Description

When i use Polygraphy to compare the accuracy between trt and onnx, there is a weird accuracy lost. I don’t choose either fp16 or int8.
I want to know how to figure it out and why it occurs?

Environment

TensorRT Version:8.2.2.1
GPU Type: Titan RTX
Nvidia Driver Version: 470.42.01
CUDA Version: 11.4
CUDNN Version: 8.2.1.32-1
Operating System + Version: ubuntu1804
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag): none

Relevant Files

here is my model:

Steps To Reproduce

Here is my command line:

/path/to/anaconda3/envs/deployment/bin/polygraphy run /path/to/deployment/models/myhr
net154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000

and the output as belows:

[I] trt-runner-N0-01/18/22-20:00:38     | Activating and starting inference
[I]     Configuring with profiles: [Profile().add(input.1, min=[2, 3, 384, 288], opt=[2, 3, 384, 288], max=[2, 3, 384, 288])]
[I] Building engine with configuration:
    Workspace            | 6000000000 bytes (5722.05 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[01/18/2022-20:00:45] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[01/18/2022-20:01:27] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] Finished engine building in 47.574 seconds
[01/18/2022-20:01:28] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[01/18/2022-20:01:28] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] trt-runner-N0-01/18/22-20:00:38
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] trt-runner-N0-01/18/22-20:00:38
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] trt-runner-N0-01/18/22-20:00:38     | Completed 1 iteration(s) in 5.536 ms | Average inference time: 5.536 ms.
[I] onnxrt-runner-N0-01/18/22-20:00:38  | Activating and starting inference
[I] onnxrt-runner-N0-01/18/22-20:00:38
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] onnxrt-runner-N0-01/18/22-20:00:38
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] onnxrt-runner-N0-01/18/22-20:00:38  | Completed 1 iteration(s) in 154.6 ms | Average inference time: 154.6 ms.
[I] Accuracy Comparison | trt-runner-N0-01/18/22-20:00:38 vs. onnxrt-runner-N0-01/18/22-20:00:38
[I]     Comparing Output: '2947' (dtype=float32, shape=(2, 17, 96, 72)) with '2947' (dtype=float32, shape=(2, 17, 96, 72)) | Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I]         trt-runner-N0-01/18/22-20:00:38: 2947 | Stats: mean=55.534, std-dev=9.8689, var=97.396, median=54.608, min=16.535 at (0, 13, 0, 2), max=87.623 at (1, 4, 46, 68), avg-magnitude=55.534
[I]             ---- Histogram ----
                Bin Range         |  Num Elems | Visualization
                (-2.97e-05, 8.76) |          0 |
                (8.76     , 17.5) |          4 |
                (17.5     , 26.3) |         52 |
                (26.3     , 35  ) |       1204 |
                (35       , 43.8) |      26644 | ##############
                (43.8     , 52.6) |      73105 | ########################################
                (52.6     , 61.3) |      62114 | #################################
                (61.3     , 70.1) |      53612 | #############################
                (70.1     , 78.9) |      16940 | #########
                (78.9     , 87.6) |       1333 |
[I]         onnxrt-runner-N0-01/18/22-20:00:38: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9664e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I]             ---- Histogram ----
                Bin Range         |  Num Elems | Visualization
                (-2.97e-05, 8.76) |     235008 | ########################################
                (8.76     , 17.5) |          0 |
                (17.5     , 26.3) |          0 |
                (26.3     , 35  ) |          0 |
                (35       , 43.8) |          0 |
                (43.8     , 52.6) |          0 |
                (52.6     , 61.3) |          0 |
                (61.3     , 70.1) |          0 |
                (70.1     , 78.9) |          0 |
                (78.9     , 87.6) |          0 |
[I]         Error Metrics: 2947
[I]             Minimum Required Tolerance: elemwise error | [abs=87.622] OR [rel=1.6141e+07]
[I]             Absolute Difference | Stats: mean=55.53, std-dev=9.8696, var=97.409, median=54.604, min=16.535 at (0, 13, 0, 2), max=87.622 at (1, 4, 46, 68), avg-magnitude=55.53
[I]                 ---- Histogram ----
                    Bin Range    |  Num Elems | Visualization
                    (16.5, 23.6) |         21 |
                    (23.6, 30.8) |        215 |
                    (30.8, 37.9) |       3740 | ##
                    (37.9, 45  ) |      32283 | #####################
                    (45  , 52.1) |      60502 | ########################################
                    (52.1, 59.2) |      52110 | ##################################
                    (59.2, 66.3) |      47035 | ###############################
                    (66.3, 73.4) |      31473 | ####################
                    (73.4, 80.5) |       6917 | ####
                    (80.5, 87.6) |        712 |
[I]             Relative Difference | Stats: mean=99830, std-dev=2.9623e+05, var=8.7751e+10, median=28881, min=577.86 at (1, 2, 18, 71), max=1.6141e+07 at (0, 11, 6, 71), avg-magnitude=99830
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (578     , 1.61e+06) |     231870 | ########################################
                    (1.61e+06, 3.23e+06) |       3070 |
                    (3.23e+06, 4.84e+06) |         52 |
                    (4.84e+06, 6.46e+06) |          4 |
                    (6.46e+06, 8.07e+06) |          6 |
                    (8.07e+06, 9.69e+06) |          2 |
                    (9.69e+06, 1.13e+07) |          0 |
                    (1.13e+07, 1.29e+07) |          1 |
                    (1.29e+07, 1.45e+07) |          1 |
                    (1.45e+07, 1.61e+07) |          2 |
[E]         FAILED | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E]     FAILED | Mismatched outputs: ['2947']
[!] FAILED | Command: /nvme/chenjinwei/anaconda3/envs/deployment/bin/polygraphy run /nvme/chenjinwei/deployment/models/myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000

Thanks!

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

Hi,
Thanks for your reply. But the issue persist.
First, I try to run my onnx model with trtexec, here is my command:

./trtexec --onnx=/nvme/chenjinwei/deployment/models/myhrnet154out_2x3x384x288.onnx --verbose

And the output is too long to post here. There is no errors. I have shared my model above (relevant files in first pose). And there is no script, just use the tool trtexec or polygraphy. Maybe you can reproduce the issue.
And i believe every operator in my model is supported, because i have converted it successfully on another server with cuda 10.2, and it passed accuracy comparison.
I also read the docs you gave, and there is no issue like me.
Very need your help!
Thanks!

Hi @user129339,

Thank you for reporting this issue.
Our team will work on it.

Hi @user129339 , I’m from the TensorRT team and I’d like to help fix the problem you’re facing. First, I’m working on reproducing the failure you observed but I’m not seeing the same issue.
My environment is similar to yours:
GPU: Titan RTX
Cuda: 11.4
Cudnn: 8.2.1.32
Cublas: 11.5.2
OS: ubuntu18.04
python: 3.6

To further diagnose where the problem may be coming from, it would be great if you could try a few experiments.

  1. Disable tactic sources. There are 3 tactic sources that are enabled: cudnn, cublas, and cublaslt. Try disabling each one, one at a time, to see if any of these in particular could be where the problem is. There is a command line option to explicitly enable selected tactics, --tactic-sources, so to enable only cudnn and cublas for example, you can use --tactic-sources cudnn cublas. Let me know if there is a tactic source that is causing the problem.
  2. In case it is due to an environment / installation problem, try out one of our containers. We have containers available that are packaged with TensorRT 8.2.2: see https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_22-01.html#rel_22-01 . The main requirement to use this is that you will need to update your nvidia driver version to >= 510, if you can. Once you have the container running, please try running your model to check for accuracy issues to see if they persist.

For reference, this is the output that I’m observing.

sirej@78c54f2459f5:~/trt$ polygraphy run myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000
[I] trt-runner-N0-02/17/22-13:41:31     | Activating and starting inference
[I]     Configuring with profiles: [Profile().add(input.1, min=[2, 3, 384, 288], opt=[2, 3, 384, 288], max=[2, 3, 384, 288])]
[I] Building engine with configuration:
    Workspace            | 6000000000 bytes (5722.05 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[02/17/2022-13:41:35] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/17/2022-13:42:08] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] Finished engine building in 35.205 seconds
[02/17/2022-13:42:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/17/2022-13:42:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] trt-runner-N0-02/17/22-13:41:31    
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] trt-runner-N0-02/17/22-13:41:31    
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] trt-runner-N0-02/17/22-13:41:31     | Completed 1 iteration(s) in 20.17 ms | Average inference time: 20.17 ms.
[I] onnxrt-runner-N0-02/17/22-13:41:31  | Activating and starting inference
[I] onnxrt-runner-N0-02/17/22-13:41:31 
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] onnxrt-runner-N0-02/17/22-13:41:31 
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] onnxrt-runner-N0-02/17/22-13:41:31  | Completed 1 iteration(s) in 184 ms | Average inference time: 184 ms.
[I] Accuracy Comparison | trt-runner-N0-02/17/22-13:41:31 vs. onnxrt-runner-N0-02/17/22-13:41:31
[I]     Comparing Output: '2947' (dtype=float32, shape=(2, 17, 96, 72)) with '2947' (dtype=float32, shape=(2, 17, 96, 72)) | Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I]         trt-runner-N0-02/17/22-13:41:31: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9663e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I]         onnxrt-runner-N0-02/17/22-13:41:31: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9663e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I]         Error Metrics: 2947
[I]             Minimum Required Tolerance: elemwise error | [abs=3.7253e-07] OR [rel=0.00049604] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=5.97e-09, std-dev=1.3299e-08, var=1.7685e-16, median=1.1642e-09, min=0 at (0, 0, 0, 0), max=3.7253e-07 at (1, 2, 17, 71), avg-magnitude=5.97e-09
[I]             Relative Difference | Stats: mean=1.5915e-06, std-dev=3.673e-06, var=1.3491e-11, median=7.2706e-07, min=0 at (0, 0, 0, 0), max=0.00049604 at (0, 2, 87, 12), avg-magnitude=1.5915e-06
[I]         PASSED | Difference is within tolerance (rel=0.001, abs=0.001)
[I]     PASSED | All outputs matched | Outputs: ['2947']
[I] PASSED | Command: /home/sirej/.local/bin/polygraphy run myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000

@user129339 In addition, TensorRT 8.4.0 was released last week so you can try that out to see if it fixes your problem: https://developer.nvidia.com/nvidia-tensorrt-8x-download

Hi @user129339,

We are looking forward to your response. Could you please check and confirm if it works for you on 8.4.0 version.

Thank you.