Cudnn Error in initializeCommonContext

Description

Hi, I met a problem when I tried to deserialize a TensorRT engine and create the context. The system threw an Error like below:
safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
INVALID_STATE: std::exception
INVALID_CONFIG:Deserialize the cuda engine failed.”

So I would like to claim my environment first.

Environment

TensorRT Version: 7.2.2.3
GPU Type: 1050Ti
Nvidia Driver Version: 440.82
CUDA Version: 10.2
CUDNN Version: 8.0.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): not used
TensorFlow Version (if applicable): not used
PyTorch Version (if applicable): not used
Baremetal or Container (if container which image + tag):

I tried to find some useful information from github or previous posts. It seems like the problem caused by the incompatiblility among TensorRT, CUDA, CUDNN and CUDA driver versions.

However, I have checked the Cudnn-8.0.5 support matrix and found that, the related version of CUDA and driver is 10.2 and 440, so I would assume the version of these dependencies is correct.

Then I think there might be some wrong operations during the installation. So I would like to show the process of building environment.

Step 1: I download the TensorRT-7.2.2.3 (tar package), CUDA-10.2(runtime file), cudnn-8.0.5 (tar package) from the official website. (The driver 440.82 is already exists)
Step 2: I run the cuda runfile to install CUDA toolkit(without driver and samples). I decompress the TensorRT tar package and cudnn tar package.
Step 3: I copy the include files and .so libs from cudnn “include/lib” directory to cuda “include/lib64” directory.
Step 4: I exported the TensorRT lib path and cuda lib path.
Step 5: I put my resnet50.onnx model file to the directory TensorRT-7.2.2.3/bin and use ./trtexec to convert the model from .onnx to .trt using
./trtexec --onnx=resnet50.onnx --saveEngine=resnet50.trt
Step 6: I deserialize my model in C++ then it threw an error like above.

I get confused about the error log because the installation of cudnn was quite simple. I also tried to install cudnn using the debian package but still causing the same problem. I have seen the previous post that this error might cause by OOM but I am sure deserailizing an resnet50 model will not cost more than 4g memory. I have checked that before deserailzing the model there is 3+GB memory left.

Also there is one thing I forgot to mention. If there are both cudnn-7 and cudnn-8 exists in the system, will it affect the deserialization process?

I don’t know if I did something wrong, could anyone give some advise?

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,

This could cause the issue. Could not initialize CuDNN error usually occurs due to dependencies version mismatch. Could you please clean up the CuDNN, TensorRT and try installing them again.

You can also use TensorRT NGC to avoid system dependencies. Container Release Notes :: NVIDIA Deep Learning TensorRT Documentation

Thank you.

Hi, Thank you for your reply. I finally found out the problem, it was really tricky.
Firstly, my entire project consists of two majority parts TensorFlow-C and TensorRT. In the old situation, These two SDK were in version 1.15 (TFC) and 7.0.0.11 (TensorRT) and the system environment was CUDA-10.0+Cudnn-7.6.5. At that time, everything worked perfectly and no Error occurred.
Then TensorRT was upgraded to 7.2.2.3, the CUDA version and Cudnn version needed to be updated to 10.2 and 8.0.5. The tircky part is TFC can run using CPU that does not threw any error even version 1.15 is not compatible witg CUDA-10.2, but TensorRT which is supposed to be fine threw the cudnn initialization ERROR like above was caused by the conflict between TFC and TensorRT. The way that I proved it is to unload the entire TFC mode and load TensorRT part only. Only in this way the TensorRT section was able to run, so I guess there is something between TFC and TensorRT that restrict the behaviours of cudnn when TensorRT module is deserialized.
The problem is the entire project was complied in one environment, which means TensorRT and TFC have to use the same env to run, which seems not realistic in current phenomenon. So what I would do is to isolate the two sections individually and complie using different dependencies.

I can’t think out any better ways, and I am not 100% sure the error above is indeed caused by the confict between TFC and TensorRT.
Looking for a professional solution!