Description
A clear and concise description of the bug or issue.
YOLOv8.pt file inorder to load
Environment
1. get the onnx ✅ 2. convert to .trt ✅ 3. use the stack solution to infer using the engine (# Output is empty)
Hello NVIDIA Community,
I am working on a project that involves using TensorRT for real-time object detection. I have successfully converted my YOLOv8 model to a TensorRT engine (.trt
file). However, I am facing an issue where the engine produces empty outputs during inference.
System Details:
- TensorRT Version: [, 10.3.0]
- Cuda compilation tools, release ** 12.6, V12.6.20
- Build cuda_12.6.r12.6/compiler.34431801_0
- Python Version: [, 3.11]
- Environment: Google Colab with NVIDIA A100 GPU
Problem Description:
I have implemented a Python class to handle inference using the TensorRT engine. The engine loads without any issues, and all necessary bindings are allocated. However, when I run inference, the output is unexpectedly empty. The input data is correctly prepared and copied to the device, and the inference process runs without any apparent errors. Yet, the results are empty or contain only zeros.
Here’s an outline of the code I’m using for inference:
I built my code using this stack overflow solution
Stackoverflow solution
I changed it inorder to adapt it to version 10.03 of TensorRT
How ever the result gives 0 as output
Code to conver to onnx
!yolo export model=/content/yolov8n.pt format=onnx imgsz=640,640 #Step 1
I created the engine using this code t in google collab
!./trtexec --onnx=/content/yolov8n.onnx --saveEngine=/content/yolov8n.engine
[quote=“arshmansuriadd, post:1, topic:305005, full:true”]
Description
A clear and concise description of the bug or issue.
YOLOv8.pt file inorder to load
Environment
TensorRT Version: 10.03
GPU Type: A100
Nvidia Driver Version: Mentioned in the notebook
Operating System + Version: Google Collab so ubuntu
Steps To Reproduce
Google collab link
Please include:
- Download the .pt file linked
*Convert to onnx (after downloading the necessary libraries) - Conver to .engine (After Downloading the libs like tensor RT)
[/quote]
[quote=“arshmansuriadd, post:1, topic:305005, full:true”]
Description
A clear and concise description of the bug or issue.
YOLOv8.pt file inorder to load
Environment
TensorRT Version: 10.03
GPU Type: A100
Nvidia Driver Version: Mentioned in the notebook
Operating System + Version: Google Collab so ubuntu
Steps To Reproduce
This is what I am using to make an inference
import tensorrt as trt
import numpy as np
import os
from ultralytics import YOLO
import pycuda.driver as cuda
import pycuda.autoinit
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
class TrtModel:
def __init__(self,engine_path,max_batch_size=1,dtype=np.float32):
self.engine_path = engine_path
self.dtype = dtype
self.logger = trt.Logger(trt.Logger.WARNING)
self.runtime = trt.Runtime(self.logger)
self.engine = self.load_engine(self.runtime, self.engine_path)
self.max_batch_size = max_batch_size
self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers()
self.context = self.engine.create_execution_context()
@staticmethod
def load_engine(trt_runtime, engine_path):
trt.init_libnvinfer_plugins(None, "")
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
def allocate_buffers(self):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in self.engine:
size = trt.volume(self.engine.get_tensor_shape(binding)) * self.max_batch_size
host_mem = cuda.pagelocked_empty(size, self.dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
bindings.append(int(device_mem))
if self.engine.get_tensor_mode(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
def __call__(self,x:np.ndarray,batch_size=2):
x = x.astype(self.dtype)
np.copyto(self.inputs[0].host,x.ravel())
for inp in self.inputs:
cuda.memcpy_htod_async(inp.device, inp.host, self.stream)
tensor_name = engine.get_tensor_name(0) # input tensor
self.context.execute_async_v3(stream_handle=self.stream.handle)#Problem!!
for out in self.outputs:
cuda.memcpy_dtoh_async(out.host, out.device, self.stream)
self.stream.synchronize()
return [out.host.reshape(batch_size,-1) for out in self.outputs]
if __name__ == "__main__":
batch_size = 1
trt_engine_path ="/content/yolov8n.engine" #"/content/onnx-tensorrt/yolov8n.trt"#os.path.join("..","models","main.trt")
model = TrtModel(trt_engine_path)
binding_name=model.engine[0]
shape = (3, 640, 640) #model.engine.get_tensor_shape(binding_name)
# Create an array with random integers between 0 and 255
data = np.random.randint(0, 255, (batch_size, *shape), dtype=np.uint8)
# Normalize the data to the range [0, 1]
data = data.astype(np.float32) / 255.0
result = model(data,batch_size)