TensorRT ROS2 Node

m.westendorf · October 6, 2023, 5:52pm

Hey all,

I have relatively little experience with Tensor RT, but I was able to create a Python ROS 2 node that implements an object detection network as an engine file based on the Python API example.

My setup is as follows:
-ubuntu 20.04
-rtx 3080 ti
-nvidia docker container for ROS 2 (althack, foxy-cuda-gazebo-nvidia with dev as base)

The ROS2 node initializes the following parts:

    self.serialized_engine = None
    self.engine = None
    self.ctx = None
    self.stream = None
    self.TRT_ENGINE_DATATYPE = trt.DataType.FLOAT
    self.logger = trt.Logger(trt.Logger.WARNING)
    self.runtime = trt.Runtime(self.logger)
    with open(r"your/path/","rb") as f:
        serialized_engine = f.read()
    self.engine = self.runtime.deserialize_cuda_engine(serialized_engine)
    **self.allocate_buffers_gpu()**
    self.ctx = self.engine.create_execution_context()
    self.input_volume = trt.volume(self.INPUT_SHAPE)
    self.device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

The callback function of the node performs the inference. The TensorRT related parts are as well listed below:

    self.inputs[0].copy_(imgs_norm2.view(-1)) 
    # Fetch output from the model
    [predictions_raw] = **self.do_inference_gpu()**

The function definitions look like this:

def do_inference_gpu(self):

    # Transfer inputs to GPU
    for i, inp in enumerate(self.inputs):
        self.inputs[i] = inp.to('cuda:0')  # Move the input tensor to the GPU
    
    # Execute the model
    **self.ctx.execute_async_v2(bindings=self.bindings, stream_handle=self.stream.handle) #**
    
    # Synchronize the stream
    self.stream.synchronize()

    # Create PyTorch tensors from the output buffers without transferring to CPU
    output_tensors = [torch.as_tensor(out) for out in self.outputs]

    return output_tensors

def allocate_buffers_gpu(self):
    """Allocates device buffer for TRT engine inference on the GPU.

    Args:
        engine (trt.ICudaEngine): TensorRT engine

    Returns:
        inputs [HostDeviceMem]: engine input memory on GPU
        outputs [HostDeviceMem]: engine output memory on GPU
        bindings [int]: buffer to device bindings
        stream (cuda.Stream): cuda stream for engine inference synchronization
    """
    self.inputs = []
    self.outputs = []
    self.bindings = []
    self.stream = cuda.Stream()

    binding_to_type = {"Input": torch.float32, "NMS": torch.float32, "NMS_1": torch.int32, "images": 
     torch.float32, "output0": torch.float32}

    for binding in self.engine:
        size = torch.tensor(trt.volume(self.engine.get_binding_shape(binding)), dtype=torch.int64, device='cuda')
        dtype = binding_to_type[str(binding)]

        # Allocate device buffers on the GPU
        device_mem = torch.empty(size, dtype=dtype, device='cuda')
        
        # Append the device buffer to device bindings
        self.bindings.append(int(device_mem.data_ptr()))
        
        # Append to the appropriate list
        if self.engine.binding_is_input(binding):
            self.inputs.append(device_mem)
        else:
            self.outputs.append(device_mem)

    return True

When I start the node everything works properly and ROS2 does not chrash. Sadly I still get the error message of TensorRT:
[10/06/2023-17:26:43] [TRT] [E] 1: [reformatRunner.cpp::execute::603] Error Code 1: Cuda Runtime (invalid resource handle). I highlighted the line of code as well. Generally no output is given by the net anymore. What can be the reason that the cuda stream or the binding handly is incorrect? If needed I can share more information.

I’m grateful for any help.

spolisetty · November 15, 2023, 1:40pm

It appears there was some mistake in loading the engine script. Please refer to the below references, which may help you.

Topic		Replies	Views
CUDA cask failure at execution for trt_maxwell_scudnn_128x32_relu_small_nn_v1. TensorRT	1	1182	December 16, 2019
TensorRT inference context in ROS callback TensorRT tensorrt , cuda	13	2752	January 8, 2023
TensorRT with ROS on Jetson AGX Xavier: Engine (de-)serialization failure Jetson AGX Xavier jetson-inference	8	1487	June 29, 2022
TensorRT Error: `[genericReformat.cuh::copyVectorizedRunKernel::1579] Error Code 1: Cuda Runtime (invalid resource handle)` in ROS Callback Function TensorRT tensorrt , ros , cuda , cudnn	0	125	December 10, 2024
TensorRT do_inference error TensorRT	19	8642	November 14, 2022
Cuda Runtime (invalid resource handle) when use TensorRT and Pytorch(on GPU) simultaneously TensorRT	5	3135	December 17, 2024
TensorRT ERROR: pointWiseV2Helpers.h::launchPwgenKernel::532 Cuda Driver (invalid resource handle) Jetson Xavier NX tensorrt , cuda , jetson-inference	3	2139	March 24, 2022
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1766	December 4, 2019
[genericReformat.cuh::copyPackedRunKernel::1487] Error Code 1: Cuda Runtime (invalid resource handle) TensorRT	0	178	August 19, 2024
Running TensorRT inference in a Python Thread TensorRT	1	579	December 30, 2022

TensorRT ROS2 Node

Related topics