Error outputs for dynamic height and width

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.4.3.1
GPU Type: 3090
Nvidia Driver Version: 510.47.03
CUDA Version: 11.6
CUDNN Version: 8.4
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.7.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.12.1
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

python export_dynamic_shape_test.py --build
python dynamic_shape_problem.py --run

Please include:

  • I want to make height and width dynamic (from 32 to 64), but I got only the correct output for a fixed width on 64 and both 32.
  • The demo script is above and easy to reproduce

Any help?

Hi,

We recommend you to please make sure, the ONNX model is exported to dynamic shape (-1) correctly.
For your reference (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime — PyTorch Tutorials 1.12.1+cu102 documentation,

Could you please share with us the ONNX model generated?

Thank you.

I have set the dynamic shape (-1) in ONNX.
And you can easily generate ONNX from the script above

I have upload the onnx to Github

And I find multi profile solution works on this issue, but I doesn’t make sense why single profile not works

Script to reproduce this issue moved to

Hi,

When we tested with the Polygraphy tool to run using TensorRT, we observed NO accuracy drop compared with ONNX-Runtime.
So we recommend you to please verify your model is correct and the inference script.

[V] Loaded Module: tensorrt | Version: 8.5.0.12 | Path: ['/usr/local/lib/python3.8/dist-packages/tensorrt']
[I] Accuracy Comparison | trt-runner-N0-11/15/22-05:21:35 vs. onnxrt-runner-N0-11/15/22-05:21:35

[I]         PASSED | Output: 'output' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['output']
[I] Accuracy Summary | trt-runner-N0-11/15/22-05:21:35 vs. onnxrt-runner-N0-11/15/22-05:21:35 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 14.610s | Command: /usr/local/bin/polygraphy run dytest.onnx --trt --onnxrt --workspace=20G --verbose

And when we execute your script observed the following output.

===> test for height: 32, width: 32, MSE: 3.3729857811291104e-17
===> test for height: 40, width: 32, MSE: 0.0004897538456134498
===> test for height: 48, width: 32, MSE: 0.0005678070592693985
===> test for height: 56, width: 32, MSE: 0.0011687211226671934
===> test for height: 64, width: 32, MSE: 0.0005086237215436995
===> test for height: 32, width: 40, MSE: 0.0006843135925009847
===> test for height: 40, width: 40, MSE: 0.0006494756671600044
===> test for height: 48, width: 40, MSE: 0.000591201358474791
===> test for height: 56, width: 40, MSE: 0.0005824036779813468
===> test for height: 64, width: 40, MSE: 0.0006070142844691873
===> test for height: 32, width: 48, MSE: 0.000581496802624315
===> test for height: 40, width: 48, MSE: 0.0005776156904175878
===> test for height: 48, width: 48, MSE: 0.0005951382336206734
===> test for height: 56, width: 48, MSE: 0.0005664692143909633
===> test for height: 64, width: 48, MSE: 0.0005567250773310661
===> test for height: 32, width: 56, MSE: 0.000705624814145267
===> test for height: 40, width: 56, MSE: 0.0005878353258594871
===> test for height: 48, width: 56, MSE: 0.0007057514158077538
===> test for height: 56, width: 56, MSE: 0.0007036780589260161
===> test for height: 64, width: 56, MSE: 0.0006398806581273675
===> test for height: 32, width: 64, MSE: 2.0302741162252688e-17
===> test for height: 40, width: 64, MSE: 2.3265697153333958e-17
===> test for height: 48, width: 64, MSE: 2.1107577972099433e-17
===> test for height: 56, width: 64, MSE: 1.760296426358374e-17
===> test for height: 64, width: 64, MSE: 1.7582084570561676e-17

Thank you.

Yes, I mean for the fixed shape it is totally correct, but not for dynamic width.

Frankly, I have used Polygraphy to debug the precision.

and it tells me something wrong with the Transpose layer, which is I cannot understand.

[I]     Comparing Output: 'onnx::Transpose_239' (dtype=float32, shape=(1, 64, 32, 2)) with 'onnx::Transpose_239' (dtype=float32, shape=(1, 64, 32, 2))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-11/15/22-18:03:01: onnx::Transpose_239 | Stats: mean=0.011494, std-dev=0.028912, var=0.00083588, median=0.011251, min=-0.19488 at (0, 11, 0, 0), max=0.20742 at (0, 4, 3, 0), avg-magnitude=0.020177
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.195 , -0.155 ) |          3 | 
                (-0.155 , -0.114 ) |         10 | 
                (-0.114 , -0.0742) |         53 | 
                (-0.0742, -0.034 ) |        151 | #
                (-0.034 , 0.00627) |        290 | ###
                (0.00627, 0.0465 ) |       3301 | ########################################
                (0.0465 , 0.0867 ) |        193 | ##
                (0.0867 , 0.127  ) |         65 | 
                (0.127  , 0.167  ) |         28 | 
                (0.167  , 0.207  ) |          2 | 
[I]         onnxrt-runner-N0-11/15/22-18:03:01: onnx::Transpose_239 | Stats: mean=0.011458, std-dev=0.04055, var=0.0016443, median=0.011217, min=-0.19488 at (0, 22, 0, 0), max=0.20742 at (0, 8, 3, 0), avg-magnitude=0.0292
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.195 , -0.155 ) |          5 | 
                (-0.155 , -0.114 ) |         22 | 
                (-0.114 , -0.0742) |        109 | #
                (-0.0742, -0.034 ) |        315 | ####
                (-0.034 , 0.00627) |        550 | ########
                (0.00627, 0.0465 ) |       2525 | ########################################
                (0.0465 , 0.0867 ) |        381 | ######
                (0.0867 , 0.127  ) |        142 | ##
                (0.127  , 0.167  ) |         45 | 
                (0.167  , 0.207  ) |          2 | 
[I]         Error Metrics: onnx::Transpose_239
[I]             Minimum Required Tolerance: elemwise error | [abs=0.35695] OR [rel=906.73] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.028014, std-dev=0.039618, var=0.0015695, median=0.0053119, min=0 at (0, 0, 0, 0), max=0.35695 at (0, 22, 0, 0), avg-magnitude=0.028014
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (0     , 0.0357) |       2903 | ########################################
                    (0.0357, 0.0714) |        610 | ########
                    (0.0714, 0.107 ) |        359 | ####
                    (0.107 , 0.143 ) |        142 | #
                    (0.143 , 0.178 ) |         61 | 
                    (0.178 , 0.214 ) |         15 | 
                    (0.214 , 0.25  ) |          2 | 
                    (0.25  , 0.286 ) |          3 | 
                    (0.286 , 0.321 ) |          0 | 
                    (0.321 , 0.357 ) |          1 | [!] FAILED | Command: /docker/miniconda3/envs/faceshifter/bin/polygraphy run dytest.onnx --trt --onnxrt --onnx-outputs mark all --trt-outputs mark all --verbose --trt-min-shapes input:[1,2,32,32] --trt-opt-shapes input:[1,2,64,64] --trt-max-shapes input:[1,2,64,64] --load-inputs custom_inputs.json --atol 1e-05 --rtol 1e-05

[I]             Relative Difference | Stats: mean=1.8051, std-dev=16.308, var=265.95, median=0.42768, min=0 at (0, 0, 0, 0), max=906.73 at (0, 21, 9, 0), avg-magnitude=1.8051
[I]                 ---- Histogram ----
                    Bin Range    |  Num Elems | Visualization
                    (0   , 90.7) |       4088 | ########################################
                    (90.7, 181 ) |          5 | 
                    (181 , 272 ) |          2 | 
                    (272 , 363 ) |          0 | 
                    (363 , 453 ) |          0 | 
                    (453 , 544 ) |          0 | 
                    (544 , 635 ) |          0 | 
                    (635 , 725 ) |          0 | 
                    (725 , 816 ) |          0 | 
                    (816 , 907 ) |          1 | 
[E]         FAILED | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[I]     Comparing Output: 'output' (dtype=float32, shape=(1, 2, 64, 32)) with 'output' (dtype=float32, shape=(1, 2, 64, 32))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-11/15/22-18:03:01: output | Stats: mean=0.011494, std-dev=0.028912, var=0.00083588, median=0.011251, min=-0.19488 at (0, 0, 11, 0), max=0.20742 at (0, 0, 4, 3), avg-magnitude=0.020177
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.195 , -0.155 ) |          3 | 
                (-0.155 , -0.114 ) |         10 | 
                (-0.114 , -0.0742) |         53 | 
                (-0.0742, -0.034 ) |        151 | #
                (-0.034 , 0.00627) |        290 | ###
                (0.00627, 0.0465 ) |       3301 | ########################################
                (0.0465 , 0.0867 ) |        193 | ##
                (0.0867 , 0.127  ) |         65 | 
                (0.127  , 0.167  ) |         28 | 
                (0.167  , 0.207  ) |          2 | 
[I]         onnxrt-runner-N0-11/15/22-18:03:01: output | Stats: mean=0.011458, std-dev=0.04055, var=0.0016443, median=0.011217, min=-0.19488 at (0, 0, 22, 0), max=0.20742 at (0, 0, 8, 3), avg-magnitude=0.0292
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.195 , -0.155 ) |          5 | 
                (-0.155 , -0.114 ) |         22 | 
                (-0.114 , -0.0742) |        109 | #
                (-0.0742, -0.034 ) |        315 | ####
                (-0.034 , 0.00627) |        550 | ########
                (0.00627, 0.0465 ) |       2525 | ########################################
                (0.0465 , 0.0867 ) |        381 | ######
                (0.0867 , 0.127  ) |        142 | ##
                (0.127  , 0.167  ) |         45 | 
                (0.167  , 0.207  ) |          2 | 
[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=0.35695] OR [rel=906.73] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.028014, std-dev=0.039618, var=0.0015695, median=0.0053119, min=0 at (0, 0, 0, 0), max=0.35695 at (0, 0, 22, 0), avg-magnitude=0.028014
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (0     , 0.0357) |       2903 | ########################################
                    (0.0357, 0.0714) |        610 | ########
                    (0.0714, 0.107 ) |        359 | ####
                    (0.107 , 0.143 ) |        142 | #
                    (0.143 , 0.178 ) |         61 | 
                    (0.178 , 0.214 ) |         15 | 
                    (0.214 , 0.25  ) |          2 | 
                    (0.25  , 0.286 ) |          3 | 
                    (0.286 , 0.321 ) |          0 | 
                    (0.321 , 0.357 ) |          1 | 
[I]             Relative Difference | Stats: mean=1.8051, std-dev=16.308, var=265.95, median=0.42768, min=0 at (0, 0, 0, 0), max=906.73 at (0, 0, 21, 9), avg-magnitude=1.8051
[I]                 ---- Histogram ----
                    Bin Range    |  Num Elems | Visualization
                    (0   , 90.7) |       4088 | ########################################
                    (90.7, 181 ) |          5 | 
                    (181 , 272 ) |          2 | 
                    (272 , 363 ) |          0 | 
                    (363 , 453 ) |          0 | 
                    (453 , 544 ) |          0 | 
                    (544 , 635 ) |          0 | 
                    (635 , 725 ) |          0 | 
                    (725 , 816 ) |          0 | 
                    (816 , 907 ) |          1 | 
[E]         FAILED | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['onnx::Transpose_239', 'output']
[V] Loaded Module: sys

here is the log file

Hi,

Sorry for the delayed response. We could reproduce the similar behavior.
Please allow us some time for our team work on this issue.

Than you.