New to gpu computing, have installed a gtx 1660ti with compute capability of 7.5. I’m using on ubuntu 21.10, have set up nvidia-toolkit, cudnn, tensorflow, tensorflow-gpu in a conda env, all appears to work fine: 1 gpu visible, built with cudnn 11.6.r11.6, tf version 2.8.0, python version 3.7.10 all in conda env running on a jupyter notebook. All seems to run fine until I attempt to train a model and then I get this error message:
2022-03-19 04:42:48.005029: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8302
and then the kernel just locks up and crashes. BTW the code worked prior to installing gpu, when it simply used cpu. Is this simply a version mismatch somewhere between python, tensorflow, tensorflow-gpu, cudnn versions or something more sinister? Thx. J.
Hi, I have the exactly same problem, I am using tensorflow 2.8.0 with CudNN 11.2 and python 3.7.6 with one GPU visible. The code also worked before I set up the Toolkits and and environment variables, but now it crashes after the same message.
2022 15:45:05.860344: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400
I would really appreciate if some one could help out with this issue, or maybe if @jimmaasuk you have found a solution for this problem,
I have the same problem, the environment is py3.8.10 tf2.8 keras 2.8 output log: Epoch 1 / 30 2022-04-23 11:25:22.138221: I tensorflow/stream_executor/cuda/cuda_dnn.cc: 368] Loaded cuDNN version 8400,
Hallo,
I had the same issue, although I do not know how to fix the problem, I have downgraded to tensorflow and keras 2.4.0 and to CUDA version 11.0 and cuDNN 8.0 and now I can finally work without issues.
Best luck
I had the same issue. It was gone, after I made sure that I had TF-gpu installed and then I downgraded the cuDNN files from 8.4.1.50 to 8.1.1.33. My setup is TF 2.9 Cuda 11.2 and cuDNN 8.1.1.33. By the way cuDNN 8.4.1.50 says it is for Cuda 11.x, maybe that is the problem in the first place,
In my case it wasn’t an issue with versions but related to how the GPUs were communicating to each other. The solution was to set NCCL_SHM_DISABLE=1. By default NCCL ( NVIDIA Collective Communications Library) uses shared memory for GPU communication and for some reason this didn’t work me. By setting this environment variable to 1 you’re disabling this shared memory and instead it will use other ways to communicate such as IP sockets. I hope this helps others who were stuck with the same issue.