How to execute .engine file?

Description

I have a custom object detection network which is similar to YOLO. I have trained the network and confirmed output both in,

  • Python environment
  • ONNX Runtime environment

Now, I would like to deploy it using TensorRT for increased performance.

I converted the ONNX file to .engine file using,

alias trtexec=/usr/src/tensorrt/bin/trtexec
trtexec --onnx=model_custom_yolo.onnx --shapes=input:1x3x384x640 --saveEngine=model_custom_yolo.engine --exportProfile=model_custom_yolo.json --fp32 

I went through Nvidia forums and other documentations, unfortunately, I am unable to get a clear and concise way of executing the engine file.

Can someone share the “A GENERIC” way of executing engine file?

Environment

TensorRT Version: 8.4.1
GPU Type: A5000
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): ‘2.0.1+cu117’
Baremetal or Container (if container which image + tag):

Hi @aravind.chakravarti ,
Please check the sample present in the link, which Implements a full ONNX-based pipeline for performing inference with the YOLOv3-608 network, including pre and post-processing.

Thanks

1 Like

Thanks @AakankshaS

I could extract the below code inside the onnnx_to_tensorrt.py file.

Let me check with my code and if I get any roadblocks I will post in the forum.

with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
        inputs, outputs, bindings, stream = common.allocate_buffers(engine)
        # Do inference
        print("Running inference on image {}...".format(input_image_path))
        # Set host input to the image. The common.do_inference function will copy the input to the GPU before executing.
        inputs[0].host = image
        trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

    # Before doing post-processing, we need to reshape the outputs as the common.do_inference will give us flat arrays.
    trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, output_shapes)]

1 Like

Those who are here looking for a working example,
I have uploaded my Hello World example code to GitHub. It is a MNIST classifier, in which I am running inference using,

  • Trained Neural Network
  • ONNX
  • TensorRT (engine or trt) file

Code is here: code
Hope it helps!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.