How to load tensorrt engine directly with building on runtime

yadavaprasath · July 18, 2021, 9:47am

Hi,

I am using the onnx_tensorrt library to convert onnx model to tensorrt model on runtime. But since it is building the tensorrt engine on runtime it takes more than 4minutes to complete. So i want to use direct tensorrt engine file directly without building on runtime. For this, i have converted onnx model to tensorrt engine .plan file offline. but i dont know how to use it directly in my code. Please help me with it.

import onnx
import onnx_tensorrt.backend as backend
import numpy as np

model = onnx.load(“/path/to/model.onnx”)
engine = backend.prepare(model, device=‘CUDA:1’)
input_data = np.random.random(size=(32, 3, 224, 224)).astype(np.float32)
output_data = engine.run(input_data)[0]
print(output_data)
print(output_data.shape)

currently i am using above code. Now i have .plan file with me ready. please let me know that how to inference .plan file directly.thanks

AastaLLL · July 19, 2021, 3:59am

Hi,

You can find an example to deploy an ONNX model with TensorRT API directly below:

Thanks.

yadavaprasath · July 19, 2021, 4:26am

Getting following error.

File “trt.py”, line 42, in Inference
np.copyto(host_inputs[0], image.ravel())
IndexError: list index out of range

Source code:

import cv2
import time
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
import os

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(TRT_LOGGER)

host_inputs =
cuda_inputs =
host_outputs =
cuda_outputs =
bindings =
def load_engine(trt_runtime, engine_path):
with open(engine_path, ‘rb’) as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
trt_engine_path = “onnxx.plan”
trt_engine = load_engine(trt_runtime, trt_engine_path)

def Inference(engine):
image = cv2.imread(“1.jpeg”)

image = (2.0 / 255.0) * image.transpose((2, 0, 1)) - 1.0

np.copyto(host_inputs[0], image.ravel())
stream = cuda.Stream()
context = engine.create_execution_context()

start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_async(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
stream.synchronize()
print("execute times "+str(time.time()-start_time))

output = host_outputs[0].reshape(np.concatenate(([1],engine.get_binding_shape(1))))
print(np.argmax(output))

Inference(trt_engine)

AastaLLL · August 3, 2021, 4:15am

Hi,

The error is caused by the host_inputs parameter.
It seems that you don’t allocate the buffer for host_inputs, cuda_inputs, host_outputs, cuda_outputs.

Please check the sample code and add it to your source.
Thanks.

system · October 10, 2021, 4:41am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Build tensorRT engine for ONNX model Jetson Nano tensorrt	2	1462	October 15, 2021
Run engine trt file on image/video Jetson TX2 tensorrt	8	1670	October 18, 2021
Engine Plan Inference on JetsonTX2 Jetson TX2 tensorrt , python	11	1976	October 18, 2021
How to infer using tensorRT on jetson nano? Jetson Nano tensorrt , deep-learning	4	1120	October 15, 2021
Tensorrt inference in real time TensorRT tensorrt , python	1	661	March 13, 2023
TensorRT deployment with engine generated from TLT example TensorRT tensorrt	8	861	December 5, 2020
Two inputs in TensorRT engine using python TensorRT tensorrt , jetson-inference , python	2	1128	November 4, 2023
Converting yolov4 onnx model to TensorRT for multi batch input TensorRT cudnn	3	744	January 31, 2024
Run onnx model on jetson nano Jetson Nano pytorch , onnx	2	8568	October 15, 2021
Onnx to trt engine DeepStream SDK	5	954	October 12, 2021

How to load tensorrt engine directly with building on runtime

Related topics