How to get results (bonding boxes, class IDs, confidences) of Object detection (Yolo v5) in TensorRT

Hello, I’m trying to do object detection(Yolov5) in TensorRT.
But I’m not sure how to get results.

My workflow:
Model is trained with Yolo v5. It works correctly in Pytorch framework.
Convert the model to ONNX format in Ubuntu PC.
Convert the ONNX-format Model to TensorRT in Jetson nano.

Problem:
I inferred with the TensorRT model. But it returns array of [nan, nan, nan, … ,nan].
How to get bounding boxes, confidences, class IDs?

def main():
    SHOW_IMAGE=True

    data_root="../Image/gratin_od/images"
    files = glob.glob(data_root+"/*.bmp")

    print("Got file names")
    files.sort()
    print(files[75])
    all_judges=np.empty(0)


    # Image shape expected by the post-processing

    with get_engine("./models/gratin.onnx", "./models/gratin.trt") as engine, \
        engine.create_execution_context() as context:

        for i in range(1):
            x_cv = cv2.imread(str(files[75]))
            start = time.time()
            x_show = cv2.cvtColor(x_cv, cv2.COLOR_BGR2RGB)


            inputs[0].host = x_show
            output= common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

            print(output[0])   
            #print>> [nan nan nan ... nan nan nan]
            #How to get bounding box, class id and confidence?


            end = time.time()
            ms = 1000.0*(end-start)
 

            if(SHOW_IMAGE):
                cv2.imshow("image", x_cv)
                cv2.waitKey(0)&0xFF

    print("END")

RunTRT2_Gratin_Yolov5.py (2.8 KB)

Hi,

Please noted that TensoRT requires GPU memory for inference.

In your do_inference function, have you copied the buffer from CPU into GPU.
And copy the output back from GPU to CPU?

You can find an example below:
https://forums.developer.nvidia.com/t/custom-resnet-jetson-xavier/160448/3

...
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_async(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
...

Thanks.

Thank you AastaLLL,

I did memcpy via common.allocate_buffers. [the below script]
I add resizing image before inference. 640 is the preprocess size. I used it when I trained the model.

And “common.do_inference” returns this.

[array([           nan,            nan,            nan, ...,
        1.7374168e+27, -1.6639190e+27, -1.1799816e+27], dtype=float32), array([nan, nan, nan, ..., nan, nan, nan], dtype=float32), array([nan, nan, nan, ..., nan, nan, nan], dtype=float32), array([nan, nan, nan, ..., nan, nan, nan], dtype=float32)]

How to convert to “bounding boxes”, “confidences”, “class IDs”?


# This sample uses a UFF MNIST model to create a TensorRT Inference Engine
from random import randint
from PIL import Image
import numpy as np
import pathlib
#!pip install pycuda
import pycuda.driver as cuda
# This import causes pycuda to automatically manage CUDA context creation and cleanup.
import pycuda.autoinit
import glob
import tensorrt as trt

import sys, os
from get_engine import get_engine
import common
import random
import cv2
import time

# You can set the logger severity higher to suppress messages (or lower to display more messages).
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)



def main():
    print("start trt2")
    batch_size = 1

    SHOW_IMAGE=True

    data_root="../Image/gratin_od/images"

    # Get files
    files= glob.glob(data_root+"/*.*")
    print("Got file names")
    #random.shuffle(files)
    files.sort()
    print(files[75])
    
    with get_engine("./models/gratin.onnx", "./models/gratin.trt") as engine, \
        engine.create_execution_context() as context:
        # Build an engine, allocate buffers and create a stream.
        # For more information on buffer allocation, refer to the introductory samples.
        inputs, outputs, bindings, stream = common.allocate_buffers(engine)
        print("allocate buffers")

        #for i in range(len(files)):
        for i in range(1):
            x_cv = cv2.imread(str(files[75]))
            start = time.time()
            x_show = cv2.cvtColor(x_cv, cv2.COLOR_BGR2RGB)
            x_show = cv2.resize(x_show, (640,640))

            inputs[0].host = x_show
            output= common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

            print(output)   
            end = time.time()
            
            if(SHOW_IMAGE):
                cv2.imshow("image", x_cv)
                cv2.waitKey(0)&0xFF

    print("END")
if __name__ == '__main__':
    main()

Hi,

Thanks for your patience.

The common.allocate_buffers(.) implementation can be found in the below location:

/usr/src/tensorrt/samples/python/common.py

It uses pinned memory to allocate host_mem and device_mem pairs.
And then add it into the bindings variable.

But inputs[0].host = x_show will break the implementation.
It is not a pinned memory and the expected host_mem pointer also changes.

To fix this, please use memory copy instead.
For example

...
x_show = cv2.resize(x_show, (640,640))

np.copyto(inputs[0].host, x_show.ravel())
output= common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
...

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.