TensorRT with ROS on Jetson AGX Xavier: Engine (de-)serialization failure

Description

Hi there, I am trying to get a python inference script running that shall use ROS to recieve images and TensorRT to do inference.
The inferencing works totally fine without ROS and no errors occur.

However, as soon as I even only import cv_bridge modules into the script, the engine creation

engine = runtime.deserialize_cuda_engine(f.read())

fails with the following error messages:

[TRT] [E] 6: [libLoader.h::DynamicLibrary::49] Error Code 6: Internal Error (Unable to load library: libcublas.so.10)
[TRT] [E] 1: [engine.cpp::deserialize::681] Error Code 1: Serialization (Serialization assertion postDeserializationCheck() failed.Post deserialization check failure)
[TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

I solved the first error by adding the following line into the python script:
libcublas = ctypes.CDLL('/usr/local/cuda-10.2/targets/aarch64-linux/lib/libcublas.so.10'). I don’t know if that’s correct. The two other errors still remain. It seems to me that cv_bridge is somehow conflicting with TensorRT. Is this a known issue? and how could I resolve it?

Steps To Reproduce

Below the code in Python to reproduce:

import ctypes      
libgcc_s = ctypes.CDLL('libgcc_s.so.1')
libcublas = ctypes.CDLL('/usr/local/cuda-10.2/targets/aarch64-linux/lib/libcublas.so.10')
ctypes.CDLL("/mmdeploy_installs/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so") #custom plugin from mmdeploy
import tensorrt as trt
import cv2
from cv_bridge import CvBridge
logger = trt.Logger(trt.Logger.VERBOSE)
trt.init_libnvinfer_plugins(logger, namespace="")
engine_file = "end2end.engine"
with open(engine_file, "rb") as f, trt.Runtime(logger) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())

I am using the NVIDIA L4T ML docker container for JetPack 4.6.1. Furthermore I am using ROS Noetic. I know Noetic is tailored for Ubuntu 20.04 and not Ubuntu 18.04 but according to this docker file from dusty-nv, it should work nonetheless and it does work for other ROS nodes on my system.

Environment

TensorRT Version: 8.2.1.8
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Linux 4.9.253-tegra
Python Version (if applicable): 3.6
Baremetal or Container (if container which image + tag): l4t-ml:r32.7.1-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Moving this post to Jetson AGX Xavier forum to get better help.

Thank you.

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

Hi @user152354, is it cv_bridge or cv2 module that causes the issue? I’m not sure why this occurs, but for the time being are you able to load your TensorRT engine before importing these other modules? Or perhaps loading your TRT engine in a different Python submodule that doesn’t import cv_bridge in that file.

Hi!

I think the problem might be because I accidentally built cv_bridge for Python 2.7 and not for Python 3.6 by changing line 11 of the CMakelists.txt of cv_bridge from

find_package(Boost REQUIRED python37)

to

find_package(Boost REQUIRED python)

I guess I should have changed it to find_package(Boost REQUIRED python3) for it to work properly with the Python 3.6 that is installed on the system.
I will install cv_bridge again and come back with results.

Thanks!

Hi @dusty_nv

cv_bridge not being the built correctly for Python 3.6 really was the problem.

Now, I am writing an inference ROS node. I would like to initialize the trt engine and the execution context during the initialization of the node because they take up quite some time and then do inferencing in a callback where I subscribe to an image topic. However somehow this does not really work.

The I am getting the following error message:

File "/catkin_ws/src/arc_bridge_detection/src/bridge_detection_node.py", line 146, in processImages
    cuda.memcpy_htod(self.input_binds[0]['allocation'], np.ascontiguousarray(img_data))
pycuda._driver.LogicError: cuMemcpyHtoD failed: invalid device context

So somehow I cannot access the engine/context inside the ROS callback.

Below is my (very simple) code:

#!/usr/bin/env python3

import ctypes
# Load mmdeploy custom plugins for TRT
ctypes.CDLL("/mmdeploy_installs/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so")

# Inferencing and image processing
import tensorrt as trt
import cv2
import pycuda.driver as cuda
import pycuda.autoinit

# ROS
import rospy
import rospkg
from sensor_msgs.msg import Image
import message_filters
from cv_bridge import CvBridge, CvBridgeError

# General
import numpy as np
import os 
from os import listdir
import time
rospy.loginfo("Imports done!")

class BridgeDetectionNode:
    def __init__(self, engine_file):
        # engine_file: The path to a serialized TensorRT engine to load from disk

        ### General initializations ######################################################################
        self.class_names=['Can', 'Carton/Paper', 'GlassBottle',
         'Other', 'PlasticBottle', 'PlasticOther', 'Wrapper']


        ### TRT Inferencing initializations###############################################################
        # Define TRT Log level        
        self.logger = trt.Logger(trt.Logger.ERROR)

        # Initialize libnvinfer plugins
        trt.init_libnvinfer_plugins(self.logger, namespace="")

        # Load TRT engine
        rospy.loginfo("1")
        print("Reading engine from file {}".format(engine_file))
        with open(engine_file, "rb") as f, trt.Runtime(self.logger) as runtime:
            self.engine = runtime.deserialize_cuda_engine(f.read())        
        rospy.loginfo("2")
        # Create execution context
        self.context = self.engine.create_execution_context()
        rospy.loginfo("3")

        # Extract info about input and output bindings of the model
        self.input_binds = []
        self.output_binds = []
        self.allocations = []

        for binding in self.engine:
            binding_idx = self.engine.get_binding_index(binding)
            name = self.engine.get_binding_name(binding_idx) # Bindings correspond to the input and outputs of the network
            dtype = self.engine.get_binding_dtype(binding_idx)
            shape = self.engine.get_binding_shape(binding_idx)
            size = np.dtype(trt.nptype(dtype)).itemsize
            for s in shape:
                size *= s

            allocation = cuda.mem_alloc(size)
            bind_description = {
                'index': binding_idx,
                'name': name,
                'dtype': np.dtype(trt.nptype(dtype)),
                'shape': list(shape),
                'allocation': allocation,
            }
            self.allocations.append(allocation)

            if self.engine.binding_is_input(binding):
                self.input_binds.append(bind_description)
            else:
                self.output_binds.append(bind_description)

        # Specify shape and size of output buffer
        self.output_spec = []
        for o in self.output_binds:
            self.output_spec.append((o['shape'], o['dtype']))
        # Create output buffer
        self.outputs = []
        for shape, dtype in self.output_spec:
            self.outputs.append(np.zeros(shape, dtype))

        rospy.loginfo("Initialized TRT for Inferencing")

        ### ROS initializations #######################################################################
        self.bridge = CvBridge()
        self.sub_rgb = rospy.Subscriber("/image_broker/camera/image_raw",Image, self.processImages)
        rospy.loginfo("Initialized ROS")

    ### Functions ####
    def processImages(self, img_msg):

        # Convert image_msg to cv image
        try:
            cv_img = self.bridge.imgmsg_to_cv2(img_msg, desired_encoding="bgra8")
        except CvBridgeError as e:
            print(e)

        # preprocess the image for inferencing
        self.img_data = self.preprocess(cv_img)
        rospy.loginfo(self.img_data)
            
        cuda.memcpy_htod(self.input_binds[0]['allocation'], np.ascontiguousarray(img_data))
        self.context.execute_v2(self.allocations)
        for o in range(len(self.outputs)):
            cuda.memcpy_dtoh(self.outputs[o], self.output_binds[o]['allocation'])


    def preprocess(self, image):
        # Mean normalization
        mean = np.array([0.485, 0.456, 0.406]).astype('float32')
        stddev = np.array([0.229, 0.224, 0.225]).astype('float32')
        image = np.asarray(image).astype('float32')
        # The Camera has an alpha channel (rgba), let's leave it away (-->rgb)
        image = image[:,:,:3]

        image = (image / float(255.0) - mean) / stddev
        # Switch from HWC to to CHW order
        image = np.moveaxis(image, 2, 0)
        image = image[np.newaxis,...]

        return image

if __name__ == '__main__':
    rospy.init_node('bridge_detection', anonymous=False)
    engine_file = "/mmdeploy_installs/mmdeploy/exports/tensorrt_22_05_17_fp16_static_480x640/end2end.engine"
    bridge_detection_node = BridgeDetectionNode(engine_file)
    
    rospy.spin()

I am struggling to find a solution to this… could you maybe advise me on how to overcome this issue. What is the best practice to declare the engine, and execution context using ROS?

Thanks a lot!

Hi @user152354, I don’t really have experience with rospy and pycuda (instead my ROS nodes are C++), but my guess what is happening is the ROS callback is occurring in a different thread or process and the pycuda context hasn’t been initialized there. You may want to check these pycuda context management functions to see if those help:

https://documen.tician.de/pycuda/driver.html#devices-and-contexts

Just a thought, you may need to call something from here the first time the callback gets run by ROS.

Thanks for your help @dusty_nv.

I solved the issue by just saving the incoming image inside the callback as a class variable and then executing the inference outside in a seperate ROS loop using:

rate = rospy.Rate(10)
    while not rospy.is_shutdown():
        # do inference
        rate.sleep()

Works flawlessly :)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.