I run the TensorRT5.0.2 with multithreading, and after running for a while, the process will crash. The log is below:
[CHECK_FAILED] [/wangyusong/net_rt/caffe_compact_gpu/src/caffe/util/math_functions.cu] [89] (error) == (cudaSuccess) an illegal memory access was encounteredCUDA cask failure at execution for trt_volta_scudnn_128x64_relu_xregs_large_nn_v1.
cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 77
cuda/caskConvolutionLayer.cpp (256) - Cuda Error in execute: 77
[ERR][05/05-23:15:19]FATAL ERROR, BACKTRACE NOT AVAILABLE
Is any idea about this?
Environment
TensorRT Version: TensorRT 5.0.2 GPU Type: Tesla T4 Nvidia Driver Version: 450.80.02 CUDA Version: CUDA10.0 CUDNN Version: 7.3.1 Operating System + Version: Ubuntu16.04 Python Version (if applicable): C++ instead TensorFlow Version (if applicable): No PyTorch Version (if applicable): No Baremetal or Container (if container which image + tag): No
Relevant Files
None
Steps To Reproduce
The code is not mine, I just want to ask for some advice.
Thanks for your reply,it will be better to give me some potential reasons for that error,because I am not a direct user of TensorRT,but the library I am depend on contains this problem,the stacktrace is below:
Looks like you’re using a very old version of the TensorRT. We recommend you please use the latest version of the TensoRT 8.4. https://developer.nvidia.com/nvidia-tensorrt-8x-download
If you still face this issue, please share with issue repro ONNX model and minimal script to try from our end for better debugging.
Well, updating TensorRT is a solution we are also advancing;or I put another way, what is the reason of this error, how can I reproduce this error with 100% probability?
I am loading, inferring, and releasing the model through multithreading and TensorRT, is there any problem? Sorry, I really need a direction to debug?
It’s another coredump below:
UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try the ONNX parser.
If you still face this issue, please share with issue repro ONNX model and minimal script to try from our end for better debugging.