Description
I want to run a sample code to do inference on specific model. At first I converted it from “.Pb” to “.ONNX” file. Now I want to run inference on any providers: like here:
providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
But other modes (CUDAExecutionProvider
and CPUExecutionProvider
) work properly.
But it seems that it could be run in TensorrtExecutionProvider
providers.
Environment
TensorRT Version: 8.2.0.6.
GPU Type: 1080 Ti
Nvidia Driver Version: 470.82.01
CUDA Version: 11.4
CUDNN Version: 8.2.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9.1+cu111
Baremetal or Container (if container which image + tag):
Relevant Files
Below I brings python code:
import onnxruntime
import cv2
import numpy as np
import time
img_path = "peeeeeey.jpeg"
image = cv2.imread(img_path, cv2.IMREAD_COLOR)
image = cv2.resize(image, (96, 96)).astype(np.float32)
image = (image - 128 ) / 128
img_data = np.expand_dims(image, 0)
print(f" onnx shapeee: {np.shape(img_data)}")
model_path = "lightqnet-dm100.onnx"
session_option = onnxruntime.SessionOptions()
session_option.log_severity_level = 4
model = onnxruntime.InferenceSession(model_path, sess_options=session_option, providers=['TensorrtExecutionProvider'])
ort_inputs_name = model.get_inputs()[0].name
ort_ouputs_names = [out.name for out in model.get_outputs()]
start = time.time()
ort_outs = model.run(ort_ouputs_names, {ort_inputs_name: img_data.astype('float32')})
outputs = np.array(ort_outs[0]).astype("float32")
print(f"TensorrtExecutionProvider quality={outputs[0,0]}, inference time = {time.time() - start}")
Steps To Reproduce
- When I run mentioned code: It shows some warning and I think it does not run on
TensorrtExecutionProvider
mode. Also I checked running time for each mode and it’s obvious that tensorrt mode does not work properly. - Warnings:
2022-11-08 14:47:13.940427040 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-11-08 11:17:13 WARNING] /onnxruntime_src/cmake/external/onnx-tensorrt/onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-11-08 14:47:13.946141807 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-11-08 11:17:13 WARNING] /onnxruntime_src/cmake/external/onnx-tensorrt/ShapedWeights.cpp:170: Weights UncertaintyModule/Bottleneck/weights/read:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
2022-11-08 14:47:14.235125857 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-11-08 11:17:14 WARNING] /onnxruntime_src/cmake/external/onnx-tensorrt/ShapedWeights.cpp:170: Weights UncertaintyModule/Bottleneck/weights/read:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
- Output of each mode and running time:
TensorrtExecutionProvider quality=0.3176532983779907, inference time = 0.015826723098754884
CUDAExecutionProvider quality=0.3176532983779907, inference time = 0.0009909896850585937
CPUExecutionProvider quality=0.3176534175872803, inference time = 0.0029390087127685546
As you can see here, running time for TensorrtExecutionProvider
is more than CUDAExecutionProvider
and CPUExecutionProvider
modes. Also output of TensorrtExecutionProvider and CUDAExecutionProvider are equal, while should not be