Two machines with very similar SW stack but different GPUs generate different folded model using the Polygraphy tool on the same model onnx input

orong13 · June 16, 2022, 5:51am

Description

Two machines with very similar SW stack (Differences are don’t care) but different GPUs generate different folded model using the Polygraphy tool on the same model onnx input.
Polygraphy version is 0.35.1

Environment

Machine#1:
TensorRT Version: 8.4.0.6
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal

Machine#2:
TensorRT Version: 8.4.0.6
GPU Type: Quadro T2000
Nvidia Driver Version: R471.68 (r471_59-5) / 30.0.14.7168 (8-5-2021)
CUDA Version: 11.4
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

model.onnx (3.6 MB)

Steps To Reproduce

Perform the following command:
python polygraphy surgeon sanitize model.onnx --fold-constants -o model_folded.onnx

Machine#1 report:

[I] Original Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 56 Initializer(s) ----
---- 560 Node(s) ----
[I] Folding Constants | Pass 1
[W] Module: ‘onnx_graphsurgeon’ version ‘0.3.12’ is installed, but version ‘>=0.3.13’ is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
2022-05-26 15:22:32.6251047 [W:onnxruntime:, unsqueeze_elimination.cc:20 onnxruntime::UnsqueezeElimination::Apply] UnsqueezeElimination cannot remove node Unsqueeze_235
[I] Total Nodes | Original: 560, After Folding: 285 | 275 Nodes Folded
[I] Folding Constants | Pass 2
[I] Total Nodes | Original: 285, After Folding: 261 | 24 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 261, After Folding: 253 | 8 Nodes Folded
[I] Folding Constants | Pass 4
[I] Total Nodes | Original: 253, After Folding: 252 | 1 Nodes Folded
[I] Folding Constants | Pass 5
[I] Total Nodes | Original: 252, After Folding: 252 | 0 Nodes Folded
[I] Saving ONNX model to: model_folded.onnx
[I] New Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(1, 6450, 128)],
798 [dtype=float32, shape=(1, 6450, 2)],
635 [dtype=float32, shape=(6450, 1)]}
---- 126 Initializer(s) ----
---- 252 Node(s) ----

Machine#2 report:

[I] Original Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 56 Initializer(s) ----
---- 560 Node(s) ----
[I] Folding Constants | Pass 1
[W] Module: ‘onnx_graphsurgeon’ version ‘0.3.12’ is installed, but version ‘>=0.3.13’ is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)
[I] Total Nodes | Original: 560, After Folding: 373 | 187 Nodes Folded
[I] Folding Constants | Pass 2
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)
[I] Total Nodes | Original: 373, After Folding: 373 | 0 Nodes Folded
[I] Saving ONNX model to: model_folded.onnx
[I] New Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 202 Initializer(s) ----
---- 373 Node(s) ----

NVES · June 16, 2022, 6:07am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

orong13 · June 16, 2022, 6:28am

Thanks,
The model was validated using onnx checker and found OK.
Attached is trtexec verbose report:
trtexec_verbose_report.txt (36.5 KB)

spolisetty · June 18, 2022, 3:05pm

Hi,

Are you using different versions of onnx or onnx-graphsurgeon?

orong13 · June 19, 2022, 7:44am

No!
This is exactly the problem.
I checked and verified all Python installed packages versions on both machines using Python -m pip list command and found them generally equal.
Specifiicaly, onnx* packges versions are:
onnx 1.10.2
onnx-graphsurgeon 0.3.12
onnx-simplifier 0.3.6
onnxoptimizer 0.2.6
onnxruntime 1.10.0
onnxruntime-gpu 1.10.0

Thanks,

pranavm · June 21, 2022, 2:27pm

It looks like this is the issue:

This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)

Can you try either upgrading ONNX-GraphSurgeon to the latest version or downgrading ONNX-Runtime to something < 1.9.0 on Machine #2?

daniel60030 · June 22, 2022, 8:20am

I’m the user of the Machine #2.

We solved the problem. we found out that the installation order between the packages- onnxruntime, onnxruntime-gpu is important. When we tried to install the onnxruntime before onnxruntime-gpu we had the problem, and when we installed the onnxruntime-gpu and after that the onnxruntime, so the problem has been solved.

Your comment helped us to find the real problem, so thank you.

system · July 6, 2022, 8:20am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Model onnx trt engine generation process report different results compared between two PCs TensorRT	9	1168	July 6, 2022
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	951	September 29, 2022
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5372	June 29, 2022
Keras CRNN model conversion to tensorrt engine error TensorRT tensorrt , tensorflow , onnx	3	956	April 8, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1081	December 13, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	5	1392	January 23, 2023
ValueError: Node... Axis is not unique while converting tensorflow segmentation model to tensorrt TensorRT tensorrt , segmentation	3	1665	March 9, 2022
ONNX Model and Tensorrt Engine gives different output for parseq model TensorRT onnx	4	1184	July 17, 2023
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1392	November 17, 2022
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3603	April 20, 2022