Two machines with very similar SW stack but different GPUs generate different folded model using the Polygraphy tool on the same model onnx input

Description

Two machines with very similar SW stack (Differences are don’t care) but different GPUs generate different folded model using the Polygraphy tool on the same model onnx input.
Polygraphy version is 0.35.1

Environment

Machine#1:
TensorRT Version: 8.4.0.6
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal

Machine#2:
TensorRT Version: 8.4.0.6
GPU Type: Quadro T2000
Nvidia Driver Version: R471.68 (r471_59-5) / 30.0.14.7168 (8-5-2021)
CUDA Version: 11.4
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

model.onnx (3.6 MB)

Steps To Reproduce

Perform the following command:
python polygraphy surgeon sanitize model.onnx --fold-constants -o model_folded.onnx

Machine#1 report:

[I] Original Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 56 Initializer(s) ----
---- 560 Node(s) ----
[I] Folding Constants | Pass 1
[W] Module: ‘onnx_graphsurgeon’ version ‘0.3.12’ is installed, but version ‘>=0.3.13’ is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
2022-05-26 15:22:32.6251047 [W:onnxruntime:, unsqueeze_elimination.cc:20 onnxruntime::UnsqueezeElimination::Apply] UnsqueezeElimination cannot remove node Unsqueeze_235
[I] Total Nodes | Original: 560, After Folding: 285 | 275 Nodes Folded
[I] Folding Constants | Pass 2
[I] Total Nodes | Original: 285, After Folding: 261 | 24 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 261, After Folding: 253 | 8 Nodes Folded
[I] Folding Constants | Pass 4
[I] Total Nodes | Original: 253, After Folding: 252 | 1 Nodes Folded
[I] Folding Constants | Pass 5
[I] Total Nodes | Original: 252, After Folding: 252 | 0 Nodes Folded
[I] Saving ONNX model to: model_folded.onnx
[I] New Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(1, 6450, 128)],
798 [dtype=float32, shape=(1, 6450, 2)],
635 [dtype=float32, shape=(6450, 1)]}
---- 126 Initializer(s) ----
---- 252 Node(s) ----

Machine#2 report:

[I] Original Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 56 Initializer(s) ----
---- 560 Node(s) ----
[I] Folding Constants | Pass 1
[W] Module: ‘onnx_graphsurgeon’ version ‘0.3.12’ is installed, but version ‘>=0.3.13’ is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)
[I] Total Nodes | Original: 560, After Folding: 373 | 187 Nodes Folded
[I] Folding Constants | Pass 2
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)
[I] Total Nodes | Original: 373, After Folding: 373 | 0 Nodes Folded
[I] Saving ONNX model to: model_folded.onnx
[I] New Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 202 Initializer(s) ----
---- 373 Node(s) ----

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Thanks,
The model was validated using onnx checker and found OK.
Attached is trtexec verbose report:
trtexec_verbose_report.txt (36.5 KB)

Hi,

Are you using different versions of onnx or onnx-graphsurgeon?

No!
This is exactly the problem.
I checked and verified all Python installed packages versions on both machines using Python -m pip list command and found them generally equal.
Specifiicaly, onnx* packges versions are:
onnx 1.10.2
onnx-graphsurgeon 0.3.12
onnx-simplifier 0.3.6
onnxoptimizer 0.2.6
onnxruntime 1.10.0
onnxruntime-gpu 1.10.0

Thanks,

It looks like this is the issue:

This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)

Can you try either upgrading ONNX-GraphSurgeon to the latest version or downgrading ONNX-Runtime to something < 1.9.0 on Machine #2?

I’m the user of the Machine #2.

We solved the problem. we found out that the installation order between the packages- onnxruntime, onnxruntime-gpu is important. When we tried to install the onnxruntime before onnxruntime-gpu we had the problem, and when we installed the onnxruntime-gpu and after that the onnxruntime, so the problem has been solved.

Your comment helped us to find the real problem, so thank you.