TensorRT do_inference error

amit.rozner · June 26, 2019, 8:42am

After loading a TensorRT engine to Python, and trying to run:

inputs[0].host = image
trt_outputs = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs,
                                      stream=stream, batch_size=1)

The following error occurs:

[TensorRT] ERROR: CUDA cask failure at execution for trt_volta_scudnn_128x32_relu_small_nn_v1.
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 33
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 33

What does it mean and how can I fix it?
Thanks

Environment:
Ubuntu 18.04
CUDA 10.0
TensorRT 5.0.4
The engine was created using the same machine.

alex.dd84 · August 3, 2019, 5:02pm

The same

[TensorRT] ERROR: CUDA cask failure at execution for trt_maxwell_scudnn_128x64_relu_medium_nn_v1.
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)

zimenglan · August 8, 2019, 2:56pm

any updates?
i have meet this problem, any help is appreciated !!!

zimenglan · August 8, 2019, 2:56pm

btw, i use tensorrt 5.1.2 and onnx 1.4.1

alex.dd84 · August 8, 2019, 4:34pm

I have solved problem!!! In my case I have problem because i’m initialize infirence in Thread, once I take full isolated process everithing goes ok. Due to small amount of time I don’t have any other options.

It Will be nice if Nvidia made some example how to use engine in Thread.

zimenglan · August 9, 2019, 2:54am

hi alex.dd84

can u give more details how did u solve it?

alex.dd84 · August 9, 2019, 4:11am

Take an example and implement infirence in main Thread, solve my problem.

wennysprin · October 21, 2019, 5:43am

hello, i got the same problem when i run a callback function to inference images in ROS, and exactly init the tensorRT engine and allocate memory in main thread. But when the engine was implement inference in main thread, problem was solved. so how to use tensorrt to inference in multi threads? Thanks

Environment:
Ubuntu 16.04
CUDA 10.0
TensorRT 5.1.5

jkjung13 · November 2, 2019, 7:43am

+1

I met this exact same error on Jetson Nano. I created TensorRT runtime, engine and context in another thread, and try to do inferencing in that thread. Then I hit this “ERROR: CUDA cask failure at execution for trt_volta_scudnn_128x32_relu_small_nn_v1” problem.

My previous code did everything in main thread, and it worked fine.

Hope NVIDIA provides an example to demonstrate how to use TensorRT in a sub-thread.

alex.dd84 · November 4, 2019, 12:21pm

Hi! I managed to move TRT in Thread. Try folowing

device = cuda.Device(0)
context = device.make_context()

// TRT inference GOES HERE

context.pop()
del context

jkjung13 · November 4, 2019, 12:58pm

@alex.dd84 Yes. That solves the pycuda context problem. I already did that.

The problem is the TRT ‘context.execute_async(bindings, stream_handle)’ call would fail if it’s in the sub-thread.

jkjung13 · November 5, 2019, 2:49am

@alex.dd84 You’re right. It was my own bad. I double-checked my code. If I do all the following in the child thread, it works. Thanks for sharing your solution.

device = cuda.Device(0)
context = device.make_context()

// TRT engine create_execution_context()
// do inferencing

context.pop()
del context

jefflgaol · November 20, 2019, 7:18am

So, do guys have any example how to perform TRT inference inside a thread?

jkjung13 · November 20, 2019, 8:31am

Yes. I shared my code on GitHub: https://github.com/jkjung-avt/tensorrt_demos/blob/master/trt_ssd_async.py

I also wrote a blog post explaining the implementation details: https://jkjung-avt.github.io/speed-up-trt-ssd/

jefflgaol · November 21, 2019, 1:59pm

Thank you very much! Now my code can also work.

jefflgaol · November 26, 2019, 6:42am

I found out another alternative. We can call cuda.Context.attach() when creating context.

davisds1 · January 29, 2020, 12:10pm

Would you mind sharing exactly what you did?

yuanzhedongcxxzn · March 11, 2020, 4:25am

Have anyone tried to run inference in the ROS callback function? I try to make context at the begin of the callback function and still get that error.

522437795 · May 11, 2022, 4:27am

Does the error you mentioned appear every time?

smaiah.sarah · November 14, 2022, 3:51pm

Did find any solution for your problem, because I am facing the same

Topic		Replies	Views
CUDA cask failure at execution for trt_maxwell_scudnn_128x32_relu_small_nn_v1. TensorRT	1	1179	December 16, 2019
How to implement TensorRT as an inference server? TensorRT	2	1789	October 24, 2019
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	19293	October 18, 2021
tensorRT dump error TensorRT	1	597	October 21, 2019
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1638	January 19, 2023
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1762	December 4, 2019
Can TensorRT do inference in a child thread ? TensorRT	6	2282	August 11, 2020
TensorRT inference context in ROS callback TensorRT tensorrt , cuda	13	2718	January 8, 2023
Running TensorRT inference in a Python Thread TensorRT	1	577	December 30, 2022
Tensorrt threading error TensorRT	1	519	July 19, 2022

TensorRT do_inference error

Related topics