CUDA, cuDNN, and PyTorch Compatibility

Hello,

I’m trying to set up a specific environment on my university’s HPC, which restricts sudo access. The HPC has Python >=3.9 and CUDA >=11.7. For my project, I need Python 3.6 and PyTorch 0.4.1, compatible with CUDA 9.2 and cuDNN 7.2.1.

What I’ve done:

  1. Created a conda environment with Python 3.6.
  2. Installed cudatoolkit=9.2 and cudnn=7.2.1.
  3. Installed PyTorch 0.4.1 using conda install pytorch=0.4.1 cuda92 -c pytorch.

Issues:

  • When installing pytorch 0.4.1 in this env i got env conflicts, so i created a python venv inside the conda env and installed 0.4.1 using pip.
  • When running nvcc --version, it shows CUDA 9.2. Also torch.cuda.version returns 9.2 which is good.
  • torch.version.cuda shows 9.2, but torch.backends.cudnn.version() returns 7.1 instead of 7.2.1.
  • During training, I encounter the error: RuntimeError: CuDNN error: CUDNN_STATUS_EXECUTION_FAILED.

Questions:

  1. Why is cuDNN version 7.1 instead of 7.2.1?
  2. How can I correctly set up the environment to avoid conflicts?
  3. How to solve CUDNN error?

Thank you for your help!