CUDA cask failure at execution for trt_volta_scudnn_128x64_relu_xregs_large_nn_v1

Description

I run the TensorRT5.0.2 with multithreading, and after running for a while, the process will crash. The log is below:
[CHECK_FAILED] [/wangyusong/net_rt/caffe_compact_gpu/src/caffe/util/math_functions.cu] [89] (error) == (cudaSuccess) an illegal memory access was encounteredCUDA cask failure at execution for trt_volta_scudnn_128x64_relu_xregs_large_nn_v1.
cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 77
cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 77

[ERR][05/05-23:15:19]FATAL ERROR, BACKTRACE NOT AVAILABLE

Is any idea about this?

Environment

TensorRT Version: TensorRT 5.0.2
GPU Type: Tesla T4
Nvidia Driver Version: 450.80.02
CUDA Version: CUDA10.0
CUDNN Version: 7.3.1
Operating System + Version: Ubuntu16.04
Python Version (if applicable): C++ instead
TensorFlow Version (if applicable): No
PyTorch Version (if applicable): No
Baremetal or Container (if container which image + tag): No

Relevant Files

None

Steps To Reproduce

The code is not mine, I just want to ask for some advice.

Hi,

The below links might be useful for you.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

Thanks for your reply,it will be better to give me some potential reasons for that error,because I am not a direct user of TensorRT,but the library I am depend on contains this problem,the stacktrace is below:


Thanks!

Hi,

Looks like you’re using a very old version of the TensorRT. We recommend you please use the latest version of the TensoRT 8.4.
https://developer.nvidia.com/nvidia-tensorrt-8x-download
If you still face this issue, please share with issue repro ONNX model and minimal script to try from our end for better debugging.

Thank you.

Well, updating TensorRT is a solution we are also advancing;or I put another way, what is the reason of this error, how can I reproduce this error with 100% probability?
I am loading, inferring, and releasing the model through multithreading and TensorRT, is there any problem? Sorry, I really need a direction to debug?
It’s another coredump below:


You can see this error is different from the above,and I found some information, as follows:

Hi,

UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try the ONNX parser.
If you still face this issue, please share with issue repro ONNX model and minimal script to try from our end for better debugging.

Thank you.