TensorRT ROS2 Node

Hey all,

I have relatively little experience with Tensor RT, but I was able to create a Python ROS 2 node that implements an object detection network as an engine file based on the Python API example.

My setup is as follows:
-ubuntu 20.04
-rtx 3080 ti
-nvidia docker container for ROS 2 (althack, foxy-cuda-gazebo-nvidia with dev as base)

The ROS2 node initializes the following parts:

    self.serialized_engine = None
    self.engine = None
    self.ctx = None
    self.stream = None
    self.TRT_ENGINE_DATATYPE = trt.DataType.FLOAT
    self.logger = trt.Logger(trt.Logger.WARNING)
    self.runtime = trt.Runtime(self.logger)
    with open(r"your/path/","rb") as f:
        serialized_engine = f.read()
    self.engine = self.runtime.deserialize_cuda_engine(serialized_engine)
    self.ctx = self.engine.create_execution_context()
    self.input_volume = trt.volume(self.INPUT_SHAPE)
    self.device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

The callback function of the node performs the inference. The TensorRT related parts are as well listed below:

    # Fetch output from the model
    [predictions_raw] = **self.do_inference_gpu()**

The function definitions look like this:

def do_inference_gpu(self):

    # Transfer inputs to GPU
    for i, inp in enumerate(self.inputs):
        self.inputs[i] = inp.to('cuda:0')  # Move the input tensor to the GPU
    # Execute the model
    **self.ctx.execute_async_v2(bindings=self.bindings, stream_handle=self.stream.handle) #**
    # Synchronize the stream

    # Create PyTorch tensors from the output buffers without transferring to CPU
    output_tensors = [torch.as_tensor(out) for out in self.outputs]

    return output_tensors

def allocate_buffers_gpu(self):
    """Allocates device buffer for TRT engine inference on the GPU.

        engine (trt.ICudaEngine): TensorRT engine

        inputs [HostDeviceMem]: engine input memory on GPU
        outputs [HostDeviceMem]: engine output memory on GPU
        bindings [int]: buffer to device bindings
        stream (cuda.Stream): cuda stream for engine inference synchronization
    self.inputs = []
    self.outputs = []
    self.bindings = []
    self.stream = cuda.Stream()

    binding_to_type = {"Input": torch.float32, "NMS": torch.float32, "NMS_1": torch.int32, "images": 
     torch.float32, "output0": torch.float32}

    for binding in self.engine:
        size = torch.tensor(trt.volume(self.engine.get_binding_shape(binding)), dtype=torch.int64, device='cuda')
        dtype = binding_to_type[str(binding)]

        # Allocate device buffers on the GPU
        device_mem = torch.empty(size, dtype=dtype, device='cuda')
        # Append the device buffer to device bindings
        # Append to the appropriate list
        if self.engine.binding_is_input(binding):

    return True

When I start the node everything works properly and ROS2 does not chrash. Sadly I still get the error message of TensorRT:
[10/06/2023-17:26:43] [TRT] [E] 1: [reformatRunner.cpp::execute::603] Error Code 1: Cuda Runtime (invalid resource handle). I highlighted the line of code as well. Generally no output is given by the net anymore. What can be the reason that the cuda stream or the binding handly is incorrect? If needed I can share more information.

I’m grateful for any help.

It appears there was some mistake in loading the engine script. Please refer to the below references, which may help you.