Could not infer onnx model for TensorrtExecutionProvider provider

Description

I want to run a sample code to do inference on specific model. At first I converted it from “.Pb” to “.ONNX” file. Now I want to run inference on any providers: like here:

providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

But other modes (CUDAExecutionProvider and CPUExecutionProvider) work properly.

But it seems that it could be run in TensorrtExecutionProvider providers.

Environment

TensorRT Version: 8.2.0.6.
GPU Type: 1080 Ti
Nvidia Driver Version: 470.82.01
CUDA Version: 11.4
CUDNN Version: 8.2.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9.1+cu111
Baremetal or Container (if container which image + tag):

Relevant Files

Below I brings python code:

import onnxruntime
import cv2
import numpy as np
import time

img_path = "peeeeeey.jpeg"
image = cv2.imread(img_path, cv2.IMREAD_COLOR)
image = cv2.resize(image, (96, 96)).astype(np.float32)
image = (image  - 128 ) / 128
img_data = np.expand_dims(image, 0)

print(f" onnx  shapeee: {np.shape(img_data)}") 

model_path = "lightqnet-dm100.onnx"
session_option = onnxruntime.SessionOptions()
session_option.log_severity_level = 4

model = onnxruntime.InferenceSession(model_path, sess_options=session_option,  providers=['TensorrtExecutionProvider'])
ort_inputs_name = model.get_inputs()[0].name
ort_ouputs_names = [out.name for out in model.get_outputs()]


start = time.time()
ort_outs = model.run(ort_ouputs_names, {ort_inputs_name: img_data.astype('float32')})
outputs = np.array(ort_outs[0]).astype("float32")
print(f"TensorrtExecutionProvider quality={outputs[0,0]}, inference time = {time.time() - start}")

Steps To Reproduce

  • When I run mentioned code: It shows some warning and I think it does not run on TensorrtExecutionProvider mode. Also I checked running time for each mode and it’s obvious that tensorrt mode does not work properly.
  • Warnings:
2022-11-08 14:47:13.940427040 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-11-08 11:17:13 WARNING] /onnxruntime_src/cmake/external/onnx-tensorrt/onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-11-08 14:47:13.946141807 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-11-08 11:17:13 WARNING] /onnxruntime_src/cmake/external/onnx-tensorrt/ShapedWeights.cpp:170: Weights UncertaintyModule/Bottleneck/weights/read:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
2022-11-08 14:47:14.235125857 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-11-08 11:17:14 WARNING] /onnxruntime_src/cmake/external/onnx-tensorrt/ShapedWeights.cpp:170: Weights UncertaintyModule/Bottleneck/weights/read:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
  • Output of each mode and running time:
TensorrtExecutionProvider quality=0.3176532983779907, inference time = 0.015826723098754884
CUDAExecutionProvider quality=0.3176532983779907, inference time = 0.0009909896850585937
CPUExecutionProvider quality=0.3176534175872803, inference time = 0.0029390087127685546

As you can see here, running time for TensorrtExecutionProvider is more than CUDAExecutionProvider and CPUExecutionProvider modes. Also output of TensorrtExecutionProvider and CUDAExecutionProvider are equal, while should not be

Hi,

Could you please try on the latest TensorRT version 8.5.1.

Thank you.