So slow when open the trt file and create Runtime

user129339 · January 15, 2022, 5:22pm

Description

Hi,
I recently learn how to use tensorrt, and I convert a hrnet from onnx to trt successfully. But when i try to use the trt model in python 3.8, it cost too much time in “with open(“path/to/trt/file”, “rb”) as f, trt.Runtime(logger) as runtime:”. There is no warning or error message. And the cpu is actually working on it, without any output. I don’t know whether the model size is the reason, cause my trt file is about 500MB. And when i run the python file, I open gpustat or nvidia-smi is also very very very slowly.
Could you explain why it occured? How could i fix it up?

Environment

TensorRT Version= 8.2.1.8:
GPU Type=Titan RTX:
Nvidia Driver Version = 440
CUDA Version=10.2
CUDNN Version=8.3.1
Operating System + Version=ubuntu1804
Python Version (if applicable)=3.8
PyTorch Version (if applicable)=1.9:

import tensorrt as trt
logger = trt.Logger(trt.Logger.INFO)
with open(".myhrnetw48out.trt", "rb") as f, trt.Runtime(logger) as runtime:
    engine=runtime.deserialize_cuda_engine(f.read())
model_all_names= []
for idx in range(engine.num_bindings):
    is_input = engine.binding_is_input(idx)
    name = engine.get_binding_name(idx)
    op_type = engine.get_binding_dtype(idx)
    model_all_names.append(name)
    shape = engine.get_binding_shape(idx)
    print('input id:',idx,'   is input: ', is_input,'  binding name:', name, '  shape:', shape, 'type: ', op_type)

Thanks!

user129339 · January 15, 2022, 5:28pm

I even couldn’t kill the thread running the python file.

user129339 · January 15, 2022, 5:35pm

I convert my trt model from onnx model by using trtexec.

NVES · January 15, 2022, 5:38pm

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

user129339 · January 16, 2022, 6:36am

Hi,
I found the reasons: I convert the model on T4 and use it on Titan RTX.
Thanks!