Description
Two machines with very similar SW stack (Differences are don’t care) but different GPUs generate different folded model using the Polygraphy tool on the same model onnx input.
Polygraphy version is 0.35.1
Environment
Machine#1:
TensorRT Version: 8.4.0.6
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal
Machine#2:
TensorRT Version: 8.4.0.6
GPU Type: Quadro T2000
Nvidia Driver Version: R471.68 (r471_59-5) / 30.0.14.7168 (8-5-2021)
CUDA Version: 11.4
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
model.onnx (3.6 MB)
Steps To Reproduce
Perform the following command:
python polygraphy surgeon sanitize model.onnx --fold-constants -o model_folded.onnx
Machine#1 report:
[I] Original Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 56 Initializer(s) ----
---- 560 Node(s) ----
[I] Folding Constants | Pass 1
[W] Module: ‘onnx_graphsurgeon’ version ‘0.3.12’ is installed, but version ‘>=0.3.13’ is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
2022-05-26 15:22:32.6251047 [W:onnxruntime:, unsqueeze_elimination.cc:20 onnxruntime::UnsqueezeElimination::Apply] UnsqueezeElimination cannot remove node Unsqueeze_235
[I] Total Nodes | Original: 560, After Folding: 285 | 275 Nodes Folded
[I] Folding Constants | Pass 2
[I] Total Nodes | Original: 285, After Folding: 261 | 24 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 261, After Folding: 253 | 8 Nodes Folded
[I] Folding Constants | Pass 4
[I] Total Nodes | Original: 253, After Folding: 252 | 1 Nodes Folded
[I] Folding Constants | Pass 5
[I] Total Nodes | Original: 252, After Folding: 252 | 0 Nodes Folded
[I] Saving ONNX model to: model_folded.onnx
[I] New Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(1, 6450, 128)],
798 [dtype=float32, shape=(1, 6450, 2)],
635 [dtype=float32, shape=(6450, 1)]}
---- 126 Initializer(s) ----
---- 252 Node(s) ----
Machine#2 report:
[I] Original Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 56 Initializer(s) ----
---- 560 Node(s) ----
[I] Folding Constants | Pass 1
[W] Module: ‘onnx_graphsurgeon’ version ‘0.3.12’ is installed, but version ‘>=0.3.13’ is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)
[I] Total Nodes | Original: 560, After Folding: 373 | 187 Nodes Folded
[I] Folding Constants | Pass 2
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
This ORT build has [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(…, providers=[‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’], …)
[I] Total Nodes | Original: 373, After Folding: 373 | 0 Nodes Folded
[I] Saving ONNX model to: model_folded.onnx
[I] New Model:
Name: torch-jit-export | Opset: 13
---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 256)]}
---- 3 Graph Output(s) ----
{output [dtype=float32, shape=(‘Divoutput_dim_0’, ‘Divoutput_dim_1’, ‘Divoutput_dim_2’)],
798 [dtype=float32, shape=(‘Gather798_dim_0’, ‘Gather798_dim_1’, ‘Gather798_dim_2’)],
635 [dtype=float32, shape=(6450, 1)]}
---- 202 Initializer(s) ----
---- 373 Node(s) ----