So slow when open the trt file and create Runtime


I recently learn how to use tensorrt, and I convert a hrnet from onnx to trt successfully. But when i try to use the trt model in python 3.8, it cost too much time in “with open(“path/to/trt/file”, “rb”) as f, trt.Runtime(logger) as runtime:”. There is no warning or error message. And the cpu is actually working on it, without any output. I don’t know whether the model size is the reason, cause my trt file is about 500MB. And when i run the python file, I open gpustat or nvidia-smi is also very very very slowly.
Could you explain why it occured? How could i fix it up?


TensorRT Version=
GPU Type=Titan RTX:
Nvidia Driver Version = 440
CUDA Version=10.2
CUDNN Version=8.3.1
Operating System + Version=ubuntu1804
Python Version (if applicable)=3.8
PyTorch Version (if applicable)=1.9:

import tensorrt as trt
logger = trt.Logger(trt.Logger.INFO)
with open(".myhrnetw48out.trt", "rb") as f, trt.Runtime(logger) as runtime:
model_all_names= []
for idx in range(engine.num_bindings):
    is_input = engine.binding_is_input(idx)
    name = engine.get_binding_name(idx)
    op_type = engine.get_binding_dtype(idx)
    shape = engine.get_binding_shape(idx)
    print('input id:',idx,'   is input: ', is_input,'  binding name:', name, '  shape:', shape, 'type: ', op_type)


I even couldn’t kill the thread running the python file.

I convert my trt model from onnx model by using trtexec.

Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:


I found the reasons: I convert the model on T4 and use it on Titan RTX.