CUDA error: unspecified launch failure

manishkr · October 25, 2021, 12:06pm

I’m trying to run TensorRT inference of 7 streams in parallel using multiprocessing
one of the process in multiprocessing goes down while trying to load any tensor on CUDA or performing any CUDA related operation
throwing RuntimeError: CUDA error: unspecified launch failure.
It only affects one process out 7 processes running concurrently

How to reproduce:
After running inference for 4-5 hrs we get CUDA failure error for one random process
GPU utilization ~ 90%

Server specification :
GPU : Nvidia Tesla T4 16 GB
CPU : AMD 7262
cuda 11.0
cudnn 8.1
TensorRT 7.2.3.4
TRTorch 0.2.0
Ubuntu 18.04.6
Python 3.7

NVES · October 25, 2021, 12:38pm

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

Topic		Replies	Views
Error in cuda when trying to inference via multiprocessing TensorRT	2	1780	November 14, 2021
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1661	January 19, 2023
CUDA cask failure at execution for trt_volta_scudnn_128x64_relu_xregs_large_nn_v1 TensorRT	5	1034	May 12, 2022
Segmentation fault (cored dumped) when using TensorRT with multithreading TensorRT	1	1858	May 3, 2021
TensorRT fails to exit properly TensorRT tensorrt , cuda , pycuda	8	2959	October 14, 2021
TensorRT do_inference error TensorRT	19	8664	November 14, 2022
Cuda Error in launchPwgenKernel- When running a specific engine in async TensorRT tensorrt	9	2290	June 11, 2022
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1855	September 6, 2024
CUDA error 719: unspecified launch failure TensorRT	3	3716	July 5, 2022
TensorRT multi stream TensorRT	3	2969	February 29, 2024

CUDA error: unspecified launch failure

Related topics