TensorRT do_inference error

After loading a TensorRT engine to Python, and trying to run:

inputs[0].host = image
trt_outputs = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs,
                                      stream=stream, batch_size=1)

The following error occurs:

[TensorRT] ERROR: CUDA cask failure at execution for trt_volta_scudnn_128x32_relu_small_nn_v1.
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 33
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 33

What does it mean and how can I fix it?
Thanks

Environment:
Ubuntu 18.04
CUDA 10.0
TensorRT 5.0.4
The engine was created using the same machine.

The same

[TensorRT] ERROR: CUDA cask failure at execution for trt_maxwell_scudnn_128x64_relu_medium_nn_v1.
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)

any updates?
i have meet this problem, any help is appreciated !!!

btw, i use tensorrt 5.1.2 and onnx 1.4.1

I have solved problem!!! In my case I have problem because i’m initialize infirence in Thread, once I take full isolated process everithing goes ok. Due to small amount of time I don’t have any other options.

It Will be nice if Nvidia made some example how to use engine in Thread.

hi alex.dd84

can u give more details how did u solve it?

Take an example and implement infirence in main Thread, solve my problem.

hello, i got the same problem when i run a callback function to inference images in ROS, and exactly init the tensorRT engine and allocate memory in main thread. But when the engine was implement inference in main thread, problem was solved. so how to use tensorrt to inference in multi threads? Thanks

Environment:
Ubuntu 16.04
CUDA 10.0
TensorRT 5.1.5

+1

I met this exact same error on Jetson Nano. I created TensorRT runtime, engine and context in another thread, and try to do inferencing in that thread. Then I hit this “ERROR: CUDA cask failure at execution for trt_volta_scudnn_128x32_relu_small_nn_v1” problem.

My previous code did everything in main thread, and it worked fine.

Hope NVIDIA provides an example to demonstrate how to use TensorRT in a sub-thread.

Hi! I managed to move TRT in Thread. Try folowing

device = cuda.Device(0)
context = device.make_context()

// TRT inference GOES HERE

context.pop()
del context

@alex.dd84 Yes. That solves the pycuda context problem. I already did that.

The problem is the TRT ‘context.execute_async(bindings, stream_handle)’ call would fail if it’s in the sub-thread.

@alex.dd84 You’re right. It was my own bad. I double-checked my code. If I do all the following in the child thread, it works. Thanks for sharing your solution.

device = cuda.Device(0)
context = device.make_context()

// TRT engine create_execution_context()
// do inferencing

context.pop()
del context
1 Like

So, do guys have any example how to perform TRT inference inside a thread?

Yes. I shared my code on GitHub: https://github.com/jkjung-avt/tensorrt_demos/blob/master/trt_ssd_async.py

I also wrote a blog post explaining the implementation details: https://jkjung-avt.github.io/speed-up-trt-ssd/

Thank you very much! Now my code can also work.

I found out another alternative. We can call cuda.Context.attach() when creating context.

Would you mind sharing exactly what you did?

Have anyone tried to run inference in the ROS callback function? I try to make context at the begin of the callback function and still get that error.