Multiple threads execution with different engines in tensorrt

prathameshm · December 2, 2022, 9:42am

Description

I am trying to run tensorrt in multiple threads with multiple engines on same GPU. I have following architecture-

A pre built INT8 engine using trtexec from YOLOV7 onnx model. trtexec passes succesfully.
A main thread which reads this model and creates array of Engine object. Each object has its own ICudaEngine, IExecutionContext and non blocking cuda stream. Main thread initializes these objects and keeps it in array.
Now in execution after initialization parallel calls happen to these Engine objects with engineid to use. It does async memory copy. Uses enqueueV2 and CudaStreamSynchronisze as well. And returns result to main thread.

Now when I run this setup on NVIDIA MX450 it behaves like serial operation. I could see two streams happening in concurrent way(Only copy not execution) and returning results properly.

But when we run it on RTX A2000 it gives following error most of the times.
1: [cudaDriverHelpers.cpp::nvinfer1::CuDeleter<struct CUmod_st *,&enum cudaError_enum __cdecl nvinfer1::cuModuleUnloadWrapper(struct CUmod_st *)>::operator ()::29] Error Code 1: Cuda Driver (an illegal instruction was encountered)
followed by
CUDA initialization failure with error: 715
In some attempts I was able to get it running post this error. But that time results(wrong) fluctuates for some time and then gives proper one. I could see that cuda STARTS with lower usage when results are fluctuating between threads and then it reaches maximum level and then results are stable and correct.

Now when I restrict this call to only one thread it works as expected with maximum speed and no fluctuation is observed.
Further when we run two threads with two different engines in serial way then results are correct and it takes twice the time as expected.

Can someone help in resolving this issue?

Environment : Windows 10.

TensorRT Version : 8.4.3
GPU Type : RTX A2000 6 GB
Nvidia Driver Version: 527.27
CUDA Version: 11.8
CUDNN Version: 8.6
Operating System + Version: Windows 10 64 bit
Python Version (if applicable): –
TensorFlow Version (if applicable): –
PyTorch Version (if applicable): –
Baremetal or Container (if container which image + tag): –

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · December 2, 2022, 10:37am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

prathameshm · December 6, 2022, 12:35pm

I dont think INT8 conversion is problem. Otherwise it would not work in single thread too. Also the same problem exists even with FP32

spolisetty · December 13, 2022, 5:29am

Hi,

The below links might be useful for you.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

Topic		Replies	Views
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1616	January 19, 2023
Multi thread Inference tensorrt python TensorRT	1	679	February 19, 2021
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	19128	October 18, 2021
Unable to do inference of multiple engines in parallel TensorRT tensorrt , nano	3	1808	May 6, 2022
Thread safe while use tensorRT TensorRT	1	2685	March 25, 2019
Latency when running TensorRT engine on two GPU TensorRT	9	1286	August 24, 2020
Tensorrt multi gpu with multi threads TensorRT	1	1148	February 18, 2022
Run multiple model(engine) with tensorrt without deepstream TensorRT	1	1158	April 20, 2020
Concurrent tensorRT engines TensorRT jetson	1	433	December 5, 2022
Running 2 models on the same GPU with TensorRT TensorRT	7	1310	January 15, 2021

Multiple threads execution with different engines in tensorrt

Description

Environment : Windows 10.

Relevant Files

Steps To Reproduce

Related topics