Tensorrt loss accuracy when test

user129339 · January 18, 2022, 7:58pm

Description

When i use Polygraphy to compare the accuracy between trt and onnx, there is a weird accuracy lost. I don’t choose either fp16 or int8.
I want to know how to figure it out and why it occurs?

Environment

TensorRT Version:8.2.2.1
GPU Type: Titan RTX
Nvidia Driver Version: 470.42.01
CUDA Version: 11.4
CUDNN Version: 8.2.1.32-1
Operating System + Version: ubuntu1804
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag): none

Relevant Files

here is my model:

Steps To Reproduce

Here is my command line:

/path/to/anaconda3/envs/deployment/bin/polygraphy run /path/to/deployment/models/myhr
net154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000

and the output as belows:

[I] trt-runner-N0-01/18/22-20:00:38     | Activating and starting inference
[I]     Configuring with profiles: [Profile().add(input.1, min=[2, 3, 384, 288], opt=[2, 3, 384, 288], max=[2, 3, 384, 288])]
[I] Building engine with configuration:
    Workspace            | 6000000000 bytes (5722.05 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[01/18/2022-20:00:45] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[01/18/2022-20:01:27] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] Finished engine building in 47.574 seconds
[01/18/2022-20:01:28] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[01/18/2022-20:01:28] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] trt-runner-N0-01/18/22-20:00:38
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] trt-runner-N0-01/18/22-20:00:38
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] trt-runner-N0-01/18/22-20:00:38     | Completed 1 iteration(s) in 5.536 ms | Average inference time: 5.536 ms.
[I] onnxrt-runner-N0-01/18/22-20:00:38  | Activating and starting inference
[I] onnxrt-runner-N0-01/18/22-20:00:38
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] onnxrt-runner-N0-01/18/22-20:00:38
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] onnxrt-runner-N0-01/18/22-20:00:38  | Completed 1 iteration(s) in 154.6 ms | Average inference time: 154.6 ms.
[I] Accuracy Comparison | trt-runner-N0-01/18/22-20:00:38 vs. onnxrt-runner-N0-01/18/22-20:00:38
[I]     Comparing Output: '2947' (dtype=float32, shape=(2, 17, 96, 72)) with '2947' (dtype=float32, shape=(2, 17, 96, 72)) | Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I]         trt-runner-N0-01/18/22-20:00:38: 2947 | Stats: mean=55.534, std-dev=9.8689, var=97.396, median=54.608, min=16.535 at (0, 13, 0, 2), max=87.623 at (1, 4, 46, 68), avg-magnitude=55.534
[I]             ---- Histogram ----
                Bin Range         |  Num Elems | Visualization
                (-2.97e-05, 8.76) |          0 |
                (8.76     , 17.5) |          4 |
                (17.5     , 26.3) |         52 |
                (26.3     , 35  ) |       1204 |
                (35       , 43.8) |      26644 | ##############
                (43.8     , 52.6) |      73105 | ########################################
                (52.6     , 61.3) |      62114 | #################################
                (61.3     , 70.1) |      53612 | #############################
                (70.1     , 78.9) |      16940 | #########
                (78.9     , 87.6) |       1333 |
[I]         onnxrt-runner-N0-01/18/22-20:00:38: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9664e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I]             ---- Histogram ----
                Bin Range         |  Num Elems | Visualization
                (-2.97e-05, 8.76) |     235008 | ########################################
                (8.76     , 17.5) |          0 |
                (17.5     , 26.3) |          0 |
                (26.3     , 35  ) |          0 |
                (35       , 43.8) |          0 |
                (43.8     , 52.6) |          0 |
                (52.6     , 61.3) |          0 |
                (61.3     , 70.1) |          0 |
                (70.1     , 78.9) |          0 |
                (78.9     , 87.6) |          0 |
[I]         Error Metrics: 2947
[I]             Minimum Required Tolerance: elemwise error | [abs=87.622] OR [rel=1.6141e+07]
[I]             Absolute Difference | Stats: mean=55.53, std-dev=9.8696, var=97.409, median=54.604, min=16.535 at (0, 13, 0, 2), max=87.622 at (1, 4, 46, 68), avg-magnitude=55.53
[I]                 ---- Histogram ----
                    Bin Range    |  Num Elems | Visualization
                    (16.5, 23.6) |         21 |
                    (23.6, 30.8) |        215 |
                    (30.8, 37.9) |       3740 | ##
                    (37.9, 45  ) |      32283 | #####################
                    (45  , 52.1) |      60502 | ########################################
                    (52.1, 59.2) |      52110 | ##################################
                    (59.2, 66.3) |      47035 | ###############################
                    (66.3, 73.4) |      31473 | ####################
                    (73.4, 80.5) |       6917 | ####
                    (80.5, 87.6) |        712 |
[I]             Relative Difference | Stats: mean=99830, std-dev=2.9623e+05, var=8.7751e+10, median=28881, min=577.86 at (1, 2, 18, 71), max=1.6141e+07 at (0, 11, 6, 71), avg-magnitude=99830
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (578     , 1.61e+06) |     231870 | ########################################
                    (1.61e+06, 3.23e+06) |       3070 |
                    (3.23e+06, 4.84e+06) |         52 |
                    (4.84e+06, 6.46e+06) |          4 |
                    (6.46e+06, 8.07e+06) |          6 |
                    (8.07e+06, 9.69e+06) |          2 |
                    (9.69e+06, 1.13e+07) |          0 |
                    (1.13e+07, 1.29e+07) |          1 |
                    (1.29e+07, 1.45e+07) |          1 |
                    (1.45e+07, 1.61e+07) |          2 |
[E]         FAILED | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E]     FAILED | Mismatched outputs: ['2947']
[!] FAILED | Command: /nvme/chenjinwei/anaconda3/envs/deployment/bin/polygraphy run /nvme/chenjinwei/deployment/models/myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000

Thanks!

NVES · January 18, 2022, 8:38pm

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

user129339 · January 19, 2022, 2:54am

Hi,
Thanks for your reply. But the issue persist.
First, I try to run my onnx model with trtexec, here is my command:

./trtexec --onnx=/nvme/chenjinwei/deployment/models/myhrnet154out_2x3x384x288.onnx --verbose

And the output is too long to post here. There is no errors. I have shared my model above (relevant files in first pose). And there is no script, just use the tool trtexec or polygraphy. Maybe you can reproduce the issue.
And i believe every operator in my model is supported, because i have converted it successfully on another server with cuda 10.2, and it passed accuracy comparison.
I also read the docs you gave, and there is no issue like me.
Very need your help!
Thanks!

spolisetty · January 24, 2022, 5:52am

Hi @user129339,

Thank you for reporting this issue.
Our team will work on it.

sdua · February 18, 2022, 6:43pm

Hi @user129339 , I’m from the TensorRT team and I’d like to help fix the problem you’re facing. First, I’m working on reproducing the failure you observed but I’m not seeing the same issue.
My environment is similar to yours:
GPU: Titan RTX
Cuda: 11.4
Cudnn: 8.2.1.32
Cublas: 11.5.2
OS: ubuntu18.04
python: 3.6

To further diagnose where the problem may be coming from, it would be great if you could try a few experiments.

Disable tactic sources. There are 3 tactic sources that are enabled: cudnn, cublas, and cublaslt. Try disabling each one, one at a time, to see if any of these in particular could be where the problem is. There is a command line option to explicitly enable selected tactics, --tactic-sources, so to enable only cudnn and cublas for example, you can use --tactic-sources cudnn cublas. Let me know if there is a tactic source that is causing the problem.
In case it is due to an environment / installation problem, try out one of our containers. We have containers available that are packaged with TensorRT 8.2.2: see https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_22-01.html#rel_22-01 . The main requirement to use this is that you will need to update your nvidia driver version to >= 510, if you can. Once you have the container running, please try running your model to check for accuracy issues to see if they persist.

For reference, this is the output that I’m observing.

sirej@78c54f2459f5:~/trt$ polygraphy run myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000
[I] trt-runner-N0-02/17/22-13:41:31     | Activating and starting inference
[I]     Configuring with profiles: [Profile().add(input.1, min=[2, 3, 384, 288], opt=[2, 3, 384, 288], max=[2, 3, 384, 288])]
[I] Building engine with configuration:
    Workspace            | 6000000000 bytes (5722.05 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[02/17/2022-13:41:35] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/17/2022-13:42:08] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] Finished engine building in 35.205 seconds
[02/17/2022-13:42:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[02/17/2022-13:42:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.2
[I] trt-runner-N0-02/17/22-13:41:31    
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] trt-runner-N0-02/17/22-13:41:31    
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] trt-runner-N0-02/17/22-13:41:31     | Completed 1 iteration(s) in 20.17 ms | Average inference time: 20.17 ms.
[I] onnxrt-runner-N0-02/17/22-13:41:31  | Activating and starting inference
[I] onnxrt-runner-N0-02/17/22-13:41:31 
    ---- Inference Input(s) ----
    {input.1 [dtype=float32, shape=(2, 3, 384, 288)]}
[I] onnxrt-runner-N0-02/17/22-13:41:31 
    ---- Inference Output(s) ----
    {2947 [dtype=float32, shape=(2, 17, 96, 72)]}
[I] onnxrt-runner-N0-02/17/22-13:41:31  | Completed 1 iteration(s) in 184 ms | Average inference time: 184 ms.
[I] Accuracy Comparison | trt-runner-N0-02/17/22-13:41:31 vs. onnxrt-runner-N0-02/17/22-13:41:31
[I]     Comparing Output: '2947' (dtype=float32, shape=(2, 17, 96, 72)) with '2947' (dtype=float32, shape=(2, 17, 96, 72)) | Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I]         trt-runner-N0-02/17/22-13:41:31: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9663e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I]         onnxrt-runner-N0-02/17/22-13:41:31: 2947 | Stats: mean=0.003365, std-dev=0.0045788, var=2.0965e-05, median=0.0019547, min=-2.9663e-05 at (1, 6, 92, 1), max=0.11542 at (1, 2, 18, 71), avg-magnitude=0.003365
[I]         Error Metrics: 2947
[I]             Minimum Required Tolerance: elemwise error | [abs=3.7253e-07] OR [rel=0.00049604] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=5.97e-09, std-dev=1.3299e-08, var=1.7685e-16, median=1.1642e-09, min=0 at (0, 0, 0, 0), max=3.7253e-07 at (1, 2, 17, 71), avg-magnitude=5.97e-09
[I]             Relative Difference | Stats: mean=1.5915e-06, std-dev=3.673e-06, var=1.3491e-11, median=7.2706e-07, min=0 at (0, 0, 0, 0), max=0.00049604 at (0, 2, 87, 12), avg-magnitude=1.5915e-06
[I]         PASSED | Difference is within tolerance (rel=0.001, abs=0.001)
[I]     PASSED | All outputs matched | Outputs: ['2947']
[I] PASSED | Command: /home/sirej/.local/bin/polygraphy run myhrnet154out_2x3x384x288.onnx --trt --onnxrt --rtol 1e-03 --atol 1e-03 --workspace 6000000000

sdua · February 22, 2022, 7:34pm

@user129339 In addition, TensorRT 8.4.0 was released last week so you can try that out to see if it fixes your problem: https://developer.nvidia.com/nvidia-tensorrt-8x-download

spolisetty · February 24, 2022, 6:30am

Hi @user129339,

We are looking forward to your response. Could you please check and confirm if it works for you on 8.4.0 version.

Thank you.

Topic		Replies	Views
Onnx to TensorRT mismatch Jetson Orin NX tensorrt , cuda , cudnn , onnx	11	944	January 15, 2024
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	934	September 29, 2022
Onnx output differs largely to TRT engine output TensorRT	14	1684	February 25, 2023
Error outputs for dynamic height and width TensorRT	8	802	November 28, 2022
Accuracy drop in resize op when converting from ONNX to TRT FP32 TensorRT	5	1313	June 23, 2023
Why different input size causes different performance? TensorRT	4	764	October 12, 2021
[gemmBaseRunner.cpp::nvinfer1::rt::task::CaskGemmBaseRunner::executeGemm::455] Error Code 1: Cask (Cask Gemm execution) TensorRT	1	711	August 23, 2023
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	1864	June 14, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1097	January 19, 2022
Tensorrt can not speed up well TensorRT	7	1575	June 29, 2022