TensorRT inference context in ROS callback



I would like to run TensorRT inference engine in a ROS callback. If cuda is auto initialised and allocated buffer in the main thread, it complains during inference as below

[TensorRT] ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

If I manually initialise pycuda in the callback as below,

def ros_callback(msg):
    device = cuda.Device(0)
    context = device.make_context()


    del context

the context created is not what I want from a deserialized engine:

with engine.create_execution_context() as context:


Could anyone help with my confusion?
Many thanks.


TensorRT Version:
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.08, Jetpack 4.4
Python Version (if applicable): 3.6.9
ROS Version: Melodic

Hi @zdai257,
Can you please share your model and script so that i can try reproducing the issue.

Also to address the context, An engine can have multiple execution contexts, allowing one set of weights to be used for multiple overlapping inference tasks. For example, you can process images in parallel CUDA streams using one engine and one context per stream. Each context will be created on the same GPU as the engine.


thanks for your reply.
I am afraid I cannot share the model, but it can be reproduced with ANY serialized engine. Error occurred when it subscribes to a ROS Empty ("/empty_topic") message.

import os
import numpy as np
import tensorrt as trt
import rospy
from std_msgs.msg import Empty
import pycuda.driver as cuda
import pycuda.autoinit

class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

def allocate_buffers(eng):
    inputs_list = []
    inputs_rand = []
    outputs_list = []
    bindings_list = []
    stream0 = cuda.Stream()
    for binding in eng:
        size = trt.volume(eng.get_binding_shape(binding)) * eng.max_batch_size
        dtype = np.float32
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        if eng.binding_is_input(binding):
            inputs_list.append(HostDeviceMem(host_mem, device_mem))
            outputs_list.append(HostDeviceMem(host_mem, device_mem))
    return inputs_list, inputs_rand, outputs_list, bindings_list, stream0

def do_inference(ctx, bds, inpts, outs, strm, batch_size=1):
    [cuda.memcpy_htod_async(inp.device, inp.host, strm) for inp in inpts]
    ctx.execute_async(batch_size=batch_size, bindings=bds, stream_handle=strm.handle)
    [cuda.memcpy_dtoh_async(out.host, out.device, strm) for out in outs]
    return [out.host for out in outs]
def ros_callback(msg):
    with engine.create_execution_context() as context:
        for i in range(len(input0)):
            np.copyto(inputs[i].host, input0[i].ravel())
        predict = do_inference(context, bindings, inputs, outputs, stream)

with open('ANY.engine', 'rb') as f, trt.Runtime(trt.Logger(trt.Logger.VERBOSE)) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
inputs, inputs0, outputs, bindings, stream = allocate_buffers(engine)

node_h = rospy.init_node('main_node', anonymous=False)
sub_h = rospy.Subscriber("/empty_topic", Empty, ros_callback)

Hi @zdai257,
Can you share the verbose log stack with us.

Hi @AakankshaS

Sorry for the late reply. The error message after the first Ros callback reads:

[TensorRT] VERBOSE: Deserialize required 3228122 microseconds.
Binding image_1 has dimension = (1, 1, 64, 256, 3)
Binding image_2 has dimension = (1, 1, 64, 256, 3)
Binding imu_data has dimension = (1, 10, 6)
Binding delta_pose has dimension = (1, 1, 6)

[TensorRT] VERBOSE: myelinAllocCb allocated GPU 139136 bytes at 0x23919a000
[TensorRT] ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[TensorRT] VERBOSE: myelinFreeCb freeing GPU at 0x23919a000


For the specific error, i can suggest you to check the below link.
The issue might be the driver version compatibility



The program is working on Jetson AGX Xavier which doesn’t use NVIDIA driver but L4T. Also the inference does work if it’s looping in main thread. This error only occurs in ROS callback, so I suspect it’s a context/threading problem.

Thanks anyway.

Have you solved this problem ?

hi, i am have the same issue as doing inference in ros callback.
the engine is only built for the first callback, saved to a global var,
fixed some context issue by use ctx.push() …and then ctx.pop()
but the gpu memory keeps going until OOM happen.
what should i do?