CUDA error: unspecified launch failure

I’m trying to run TensorRT inference of 7 streams in parallel using multiprocessing
one of the process in multiprocessing goes down while trying to load any tensor on CUDA or performing any CUDA related operation
throwing RuntimeError: CUDA error: unspecified launch failure.
It only affects one process out 7 processes running concurrently

How to reproduce:
After running inference for 4-5 hrs we get CUDA failure error for one random process
GPU utilization ~ 90%

Server specification :
GPU : Nvidia Tesla T4 16 GB
CPU : AMD 7262
cuda 11.0
cudnn 8.1
TensorRT 7.2.3.4
TRTorch 0.2.0
Ubuntu 18.04.6
Python 3.7

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!