Hi,
I am using the onnx_tensorrt library to convert onnx model to tensorrt model on runtime. But since it is building the tensorrt engine on runtime it takes more than 4minutes to complete. So i want to use direct tensorrt engine file directly without building on runtime. For this, i have converted onnx model to tensorrt engine .plan file offline. but i dont know how to use it directly in my code. Please help me with it.
import onnx
import onnx_tensorrt.backend as backend
import numpy as np
model = onnx.load(“/path/to/model.onnx”)
engine = backend.prepare(model, device=‘CUDA:1’)
input_data = np.random.random(size=(32, 3, 224, 224)).astype(np.float32)
output_data = engine.run(input_data)[0]
print(output_data)
print(output_data.shape)
currently i am using above code. Now i have .plan file with me ready. please let me know that how to inference .plan file directly.thanks
Hi,
You can find an example to deploy an ONNX model with TensorRT API directly below:
Hi,
The error comes from dynamic shape usage.
If you don’t need to reshape the model at the runtime, please set EXPLICIT_BATCH to use static shape instead.
Here is an example of onnx model for your reference:
import cv2
import time
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(TRT_LOGGER)
host_inputs = …
Thanks.
Getting following error.
File “trt.py”, line 42, in Inference
np.copyto(host_inputs[0], image.ravel())
IndexError: list index out of range
Source code:
import cv2
import time
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
import os
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(TRT_LOGGER)
host_inputs =
cuda_inputs =
host_outputs =
cuda_outputs =
bindings =
def load_engine(trt_runtime, engine_path):
with open(engine_path, ‘rb’) as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
trt_engine_path = “onnxx.plan”
trt_engine = load_engine(trt_runtime, trt_engine_path)
def Inference(engine):
image = cv2.imread(“1.jpeg”)
image = (2.0 / 255.0) * image.transpose((2, 0, 1)) - 1.0
np.copyto(host_inputs[0], image.ravel())
stream = cuda.Stream()
context = engine.create_execution_context()
start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_async(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
stream.synchronize()
print("execute times "+str(time.time()-start_time))
output = host_outputs[0].reshape(np.concatenate(([1],engine.get_binding_shape(1))))
print(np.argmax(output))
Inference(trt_engine)
Hi,
The error is caused by the host_inputs
parameter.
It seems that you don’t allocate the buffer for host_inputs , cuda_inputs , host_outputs , cuda_outputs .
Please check the sample code and add it to your source.
Thanks.
system
Closed
October 10, 2021, 4:41am
7
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.