[TensorRT] engine happed a error in multithreaded

Description

The problem shown as the attached file that when we loop our inference for several times, each image per loop, it always **shows the strange result **, when we use it in multithreaded.

Because we need to use it in TCP server (Python) and provide a real-time identify function.


./receive230116022455/ac_receive.jpg
[01/16/2023-06:04:08] [TRT] [E] 1: [convolutionRunner.cpp::execute::391] Error Code 1: Cask (Cask convolution execution)
[01/16/2023-06:04:08] [TRT] [E] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (invalid resource handle)
['./receive230116022455/ac_receive.jpg', array([0., 0., 0., 0.], dtype=float32), 'NG1']

It could get the right result, and the data file is right.
And if reload the engine in single thread, the result may get the right result [ like OK].

./receive230116022455/ac_receive.jpg
['./receive230116022455/ac_receive.jpg', array([0.001507  , 0.01713044, 0.19878952, 0.782573  ], dtype=float32), 'OK']

Environment

TensorRT Version: 8.2.5.1
GPU Type: RTX5000
Nvidia Driver Version: 515.65
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: ubuntu 22.02
Python Version (if applicable): python3.8

Relevant Files

Steps To Reproduce

The env was [download] by this way

docker run --runtime=nvidia  -dit --rm --entrypoint "" nvcr.io/nvidia/tensorrt:22.05-py3 /bin/bash

I try to use init() and pop() to avoid the trt in multithreaded, but it was uselessful.

cuda.init()
device = cuda.Device(0)
ctx = device.make_context()

###########
#process()
###########
ctx.push()
my.infer()
ctx.pop()

How to avoid the logger return the 0 value in multithreaded? It puzzled me for a long time!

Hi,

The below links might be useful for you.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in the Deepstream forum.

or

raise the query in the Triton Inference Server Github instance issues section.

Thanks!

Tks!

I found the TensorRT is Thread Safety

The TensorRT builder may only be used by one thread at a time. If you need to run multiple builds simultaneously, you will need to create multiple builders.
The TensorRT runtime can be used by multiple threads simultaneously, so long as each object uses a different execution context.
Note: Plugins are shared at the engine level, not the execution context level, and thus plugins which may be used simultaneously by multiple threads need to manage their resources in a thread-safe manner. This is however not required for plugins based on IPluginV2Ext and derivative interfaces since we clone these plugins when ExecutionContext is created.
The TensorRT library pointer to the logger is a singleton within the library. If using multiple builder or runtime objects, use the same logger, and ensure that it is thread-safe.

So, I change the sever in single thread way and it works well.

And refer

But when I try to use it in multithreaded way, it asks App to use the different GPU mem.

It didn’t work and send the same result.

./receive230116022455/ac_receive.jpg
[01/16/2023-06:04:08] [TRT] [E] 1: [convolutionRunner.cpp::execute::391] Error Code 1: Cask (Cask convolution execution)
[01/16/2023-06:04:08] [TRT] [E] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (invalid resource handle)
['./receive230116022455/ac_receive.jpg', array([0., 0., 0., 0.], dtype=float32), 'NG1']