My TensorRT engine gives a null (empty list)

Description

A clear and concise description of the bug or issue.
YOLOv8.pt file inorder to load

Environment

1. get the onnx ✅ 2. convert to .trt ✅ 3. use the stack solution to infer using the engine (# Output is empty)

Hello NVIDIA Community,

I am working on a project that involves using TensorRT for real-time object detection. I have successfully converted my YOLOv8 model to a TensorRT engine (.trt file). However, I am facing an issue where the engine produces empty outputs during inference.

System Details:

  • TensorRT Version: [, 10.3.0]
  • Cuda compilation tools, release ** 12.6, V12.6.20
  • Build cuda_12.6.r12.6/compiler.34431801_0
  • Python Version: [, 3.11]
  • Environment: Google Colab with NVIDIA A100 GPU

Problem Description:

I have implemented a Python class to handle inference using the TensorRT engine. The engine loads without any issues, and all necessary bindings are allocated. However, when I run inference, the output is unexpectedly empty. The input data is correctly prepared and copied to the device, and the inference process runs without any apparent errors. Yet, the results are empty or contain only zeros.

Here’s an outline of the code I’m using for inference:

I built my code using this stack overflow solution
Stackoverflow solution

I changed it inorder to adapt it to version 10.03 of TensorRT
How ever the result gives 0 as output

Code to conver to onnx

!yolo export model=/content/yolov8n.pt format=onnx imgsz=640,640 #Step 1

I created the engine using this code t in google collab

!./trtexec --onnx=/content/yolov8n.onnx --saveEngine=/content/yolov8n.engine

[quote=“arshmansuriadd, post:1, topic:305005, full:true”]

Description

A clear and concise description of the bug or issue.
YOLOv8.pt file inorder to load

Environment

TensorRT Version: 10.03
GPU Type: A100
Nvidia Driver Version: Mentioned in the notebook
Operating System + Version: Google Collab so ubuntu

Steps To Reproduce

Google collab link
Please include:

  • Download the .pt file linked
    *Convert to onnx (after downloading the necessary libraries)
  • Conver to .engine (After Downloading the libs like tensor RT)
    [/quote]

[quote=“arshmansuriadd, post:1, topic:305005, full:true”]

Description

A clear and concise description of the bug or issue.
YOLOv8.pt file inorder to load

Environment

TensorRT Version: 10.03
GPU Type: A100
Nvidia Driver Version: Mentioned in the notebook
Operating System + Version: Google Collab so ubuntu

Steps To Reproduce

This is what I am using to make an inference

import tensorrt as trt
import numpy as np
import os
from ultralytics import YOLO
import pycuda.driver as cuda
import pycuda.autoinit



class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

class TrtModel:

    def __init__(self,engine_path,max_batch_size=1,dtype=np.float32):

        self.engine_path = engine_path
        self.dtype = dtype
        self.logger = trt.Logger(trt.Logger.WARNING)
        self.runtime = trt.Runtime(self.logger)
        self.engine = self.load_engine(self.runtime, self.engine_path)
        self.max_batch_size = max_batch_size
        self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers()
        self.context = self.engine.create_execution_context()



    @staticmethod
    def load_engine(trt_runtime, engine_path):
        trt.init_libnvinfer_plugins(None, "")
        with open(engine_path, 'rb') as f:
            engine_data = f.read()
        engine = trt_runtime.deserialize_cuda_engine(engine_data)
        return engine

    def allocate_buffers(self):

        inputs = []
        outputs = []
        bindings = []
        stream = cuda.Stream()

        for binding in self.engine:
            size = trt.volume(self.engine.get_tensor_shape(binding)) * self.max_batch_size
            host_mem = cuda.pagelocked_empty(size, self.dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)

            bindings.append(int(device_mem))

            if self.engine.get_tensor_mode(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
            else:
                outputs.append(HostDeviceMem(host_mem, device_mem))

        return inputs, outputs, bindings, stream


    def __call__(self,x:np.ndarray,batch_size=2):

        x = x.astype(self.dtype)

        np.copyto(self.inputs[0].host,x.ravel())

        for inp in self.inputs:
            cuda.memcpy_htod_async(inp.device, inp.host, self.stream)


        tensor_name = engine.get_tensor_name(0) # input tensor


        self.context.execute_async_v3(stream_handle=self.stream.handle)#Problem!!
        for out in self.outputs:
            cuda.memcpy_dtoh_async(out.host, out.device, self.stream)


        self.stream.synchronize()
        return [out.host.reshape(batch_size,-1) for out in self.outputs]




if __name__ == "__main__":

    batch_size = 1
    trt_engine_path ="/content/yolov8n.engine" #"/content/onnx-tensorrt/yolov8n.trt"#os.path.join("..","models","main.trt")
    model = TrtModel(trt_engine_path)
    binding_name=model.engine[0]
    shape = (3, 640, 640) #model.engine.get_tensor_shape(binding_name)
    # Create an array with random integers between 0 and 255
    data = np.random.randint(0, 255, (batch_size, *shape), dtype=np.uint8)

    # Normalize the data to the range [0, 1]
    data = data.astype(np.float32) / 255.0
    result = model(data,batch_size)


Hi @arshmansuriadd ,
Can you please help us with additional context?

Thanks

Also, the notebook you shared is restricted access.