Description
Hi, I met a problem when I tried to deserialize a TensorRT engine and create the context. The system threw an Error like below:
“safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
INVALID_STATE: std::exception
INVALID_CONFIG:Deserialize the cuda engine failed.”
So I would like to claim my environment first.
Environment
TensorRT Version: 7.2.2.3
GPU Type: 1050Ti
Nvidia Driver Version: 440.82
CUDA Version: 10.2
CUDNN Version: 8.0.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): not used
TensorFlow Version (if applicable): not used
PyTorch Version (if applicable): not used
Baremetal or Container (if container which image + tag):
I tried to find some useful information from github or previous posts. It seems like the problem caused by the incompatiblility among TensorRT, CUDA, CUDNN and CUDA driver versions.
However, I have checked the Cudnn-8.0.5 support matrix and found that, the related version of CUDA and driver is 10.2 and 440, so I would assume the version of these dependencies is correct.
Then I think there might be some wrong operations during the installation. So I would like to show the process of building environment.
Step 1: I download the TensorRT-7.2.2.3 (tar package), CUDA-10.2(runtime file), cudnn-8.0.5 (tar package) from the official website. (The driver 440.82 is already exists)
Step 2: I run the cuda runfile to install CUDA toolkit(without driver and samples). I decompress the TensorRT tar package and cudnn tar package.
Step 3: I copy the include files and .so libs from cudnn “include/lib” directory to cuda “include/lib64” directory.
Step 4: I exported the TensorRT lib path and cuda lib path.
Step 5: I put my resnet50.onnx model file to the directory TensorRT-7.2.2.3/bin and use ./trtexec to convert the model from .onnx to .trt using
./trtexec --onnx=resnet50.onnx --saveEngine=resnet50.trt
Step 6: I deserialize my model in C++ then it threw an error like above.
I get confused about the error log because the installation of cudnn was quite simple. I also tried to install cudnn using the debian package but still causing the same problem. I have seen the previous post that this error might cause by OOM but I am sure deserailizing an resnet50 model will not cost more than 4g memory. I have checked that before deserailzing the model there is 3+GB memory left.
Also there is one thing I forgot to mention. If there are both cudnn-7 and cudnn-8 exists in the system, will it affect the deserialization process?
I don’t know if I did something wrong, could anyone give some advise?
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered