Hey all,
I have relatively little experience with Tensor RT, but I was able to create a Python ROS 2 node that implements an object detection network as an engine file based on the Python API example.
My setup is as follows:
-ubuntu 20.04
-rtx 3080 ti
-nvidia docker container for ROS 2 (althack, foxy-cuda-gazebo-nvidia with dev as base)
The ROS2 node initializes the following parts:
self.serialized_engine = None
self.engine = None
self.ctx = None
self.stream = None
self.TRT_ENGINE_DATATYPE = trt.DataType.FLOAT
self.logger = trt.Logger(trt.Logger.WARNING)
self.runtime = trt.Runtime(self.logger)
with open(r"your/path/","rb") as f:
serialized_engine = f.read()
self.engine = self.runtime.deserialize_cuda_engine(serialized_engine)
**self.allocate_buffers_gpu()**
self.ctx = self.engine.create_execution_context()
self.input_volume = trt.volume(self.INPUT_SHAPE)
self.device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
The callback function of the node performs the inference. The TensorRT related parts are as well listed below:
self.inputs[0].copy_(imgs_norm2.view(-1))
# Fetch output from the model
[predictions_raw] = **self.do_inference_gpu()**
The function definitions look like this:
def do_inference_gpu(self):
# Transfer inputs to GPU
for i, inp in enumerate(self.inputs):
self.inputs[i] = inp.to('cuda:0') # Move the input tensor to the GPU
# Execute the model
**self.ctx.execute_async_v2(bindings=self.bindings, stream_handle=self.stream.handle) #**
# Synchronize the stream
self.stream.synchronize()
# Create PyTorch tensors from the output buffers without transferring to CPU
output_tensors = [torch.as_tensor(out) for out in self.outputs]
return output_tensors
def allocate_buffers_gpu(self):
"""Allocates device buffer for TRT engine inference on the GPU.
Args:
engine (trt.ICudaEngine): TensorRT engine
Returns:
inputs [HostDeviceMem]: engine input memory on GPU
outputs [HostDeviceMem]: engine output memory on GPU
bindings [int]: buffer to device bindings
stream (cuda.Stream): cuda stream for engine inference synchronization
"""
self.inputs = []
self.outputs = []
self.bindings = []
self.stream = cuda.Stream()
binding_to_type = {"Input": torch.float32, "NMS": torch.float32, "NMS_1": torch.int32, "images":
torch.float32, "output0": torch.float32}
for binding in self.engine:
size = torch.tensor(trt.volume(self.engine.get_binding_shape(binding)), dtype=torch.int64, device='cuda')
dtype = binding_to_type[str(binding)]
# Allocate device buffers on the GPU
device_mem = torch.empty(size, dtype=dtype, device='cuda')
# Append the device buffer to device bindings
self.bindings.append(int(device_mem.data_ptr()))
# Append to the appropriate list
if self.engine.binding_is_input(binding):
self.inputs.append(device_mem)
else:
self.outputs.append(device_mem)
return True
When I start the node everything works properly and ROS2 does not chrash. Sadly I still get the error message of TensorRT:
[10/06/2023-17:26:43] [TRT] [E] 1: [reformatRunner.cpp::execute::603] Error Code 1: Cuda Runtime (invalid resource handle). I highlighted the line of code as well. Generally no output is given by the net anymore. What can be the reason that the cuda stream or the binding handly is incorrect? If needed I can share more information.
I’m grateful for any help.