CUDA cask failure at execution for trt_volta_scudnn_128x64_relu_xregs_large_nn_v1

522437795 · May 10, 2022, 9:35am

Description

I run the TensorRT5.0.2 with multithreading, and after running for a while, the process will crash. The log is below:
[CHECK_FAILED] [/wangyusong/net_rt/caffe_compact_gpu/src/caffe/util/math_functions.cu] [89] (error) == (cudaSuccess) an illegal memory access was encounteredCUDA cask failure at execution for trt_volta_scudnn_128x64_relu_xregs_large_nn_v1.
cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 77
cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 77

[ERR][05/05-23:15:19]FATAL ERROR, BACKTRACE NOT AVAILABLE

Is any idea about this?

Environment

TensorRT Version: TensorRT 5.0.2
GPU Type: Tesla T4
Nvidia Driver Version: 450.80.02
CUDA Version: CUDA10.0
CUDNN Version: 7.3.1
Operating System + Version: Ubuntu16.04
Python Version (if applicable): C++ instead
TensorFlow Version (if applicable): No
PyTorch Version (if applicable): No
Baremetal or Container (if container which image + tag): No

Relevant Files

None

Steps To Reproduce

The code is not mine, I just want to ask for some advice.

NVES · May 10, 2022, 10:08am

Hi,

The below links might be useful for you.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

522437795 · May 10, 2022, 10:33am

Thanks for your reply，it will be better to give me some potential reasons for that error，because I am not a direct user of TensorRT，but the library I am depend on contains this problem，the stacktrace is below：

Thanks!

spolisetty · May 10, 2022, 1:38pm

Hi,

Looks like you’re using a very old version of the TensorRT. We recommend you please use the latest version of the TensoRT 8.4.
https://developer.nvidia.com/nvidia-tensorrt-8x-download
If you still face this issue, please share with issue repro ONNX model and minimal script to try from our end for better debugging.

Thank you.

522437795 · May 11, 2022, 2:48am

Well, updating TensorRT is a solution we are also advancing；or I put another way, what is the reason of this error, how can I reproduce this error with 100% probability?
I am loading, inferring, and releasing the model through multithreading and TensorRT, is there any problem? Sorry, I really need a direction to debug？
It’s another coredump below:

You can see this error is different from the above，and I found some information, as follows:

Speeding Up TensorRT UFF SSD
TensorRT do_inference error - TensorRT - NVIDIA Developer Forums
I’ll appreciate it if you give me a response. Thanks!

spolisetty · May 12, 2022, 10:09am

Hi,

UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try the ONNX parser.
If you still face this issue, please share with issue repro ONNX model and minimal script to try from our end for better debugging.

Thank you.