Use gpu for tensorflow, crashes

New to gpu computing, have installed a gtx 1660ti with compute capability of 7.5. I’m using on ubuntu 21.10, have set up nvidia-toolkit, cudnn, tensorflow, tensorflow-gpu in a conda env, all appears to work fine: 1 gpu visible, built with cudnn 11.6.r11.6, tf version 2.8.0, python version 3.7.10 all in conda env running on a jupyter notebook. All seems to run fine until I attempt to train a model and then I get this error message:

2022-03-19 04:42:48.005029: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8302

and then the kernel just locks up and crashes. BTW the code worked prior to installing gpu, when it simply used cpu. Is this simply a version mismatch somewhere between python, tensorflow, tensorflow-gpu, cudnn versions or something more sinister? Thx. J.

4 Likes

Hi, I have the exactly same problem, I am using tensorflow 2.8.0 with CudNN 11.2 and python 3.7.6 with one GPU visible. The code also worked before I set up the Toolkits and and environment variables, but now it crashes after the same message.

2022󈚨󈚫 15:45:05.860344: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400

I would really appreciate if some one could help out with this issue, or maybe if @jimmaasuk you have found a solution for this problem,

Best wishes,
Bartosz

5 Likes

me too , i would appreciate it vary much

Same exact problem here, running on tf 2.8 and python 3.9 in PyCharm with a Conda env. Would very much appreciate some help with this.

I have the same problem, the environment is py3.8.10 tf2.8 keras 2.8 output log: Epoch 1 / 30 2022-04-23 11:25:22.138221: I tensorflow/stream_executor/cuda/cuda_dnn.cc: 368] Loaded cuDNN version 8400,

Hallo,
I had the same issue, although I do not know how to fix the problem, I have downgraded to tensorflow and keras 2.4.0 and to CUDA version 11.0 and cuDNN 8.0 and now I can finally work without issues.
Best luck

me too.
same issue.
GTX1060, tf 2.8( using V1 API) , python 3.9 , CUDA11.6, cuDnn_windows_8.4.0.27

the code worked well with CPU, untill cuDnn was installed.

2022-05-03 23:36:26.225210: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400

Process finished with exit code -1073740791 (0xC0000409)

1 Like

My issue was resolved.

Nvidia driver, cuDnn version, python, CUDA…… everything is ok.

Root cause: a wrong version tensorflow(only for CPU) was installed in pycharm. I removed it, and installed tensorflow-gpu version

Hello,
I’m experiencing the very same issue right now, same GPU model.

@lx6636 your resolution didn’t help me :(

@jimmaasuk (or anyone else in this post) - did any of you find a solution for that??

Resolved by installing Zlib and add it to path

Hello, you can try this:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

Tensorflow was slower, but it will working.

5 Likes

I had the same issue. It was gone, after I made sure that I had TF-gpu installed and then I downgraded the cuDNN files from 8.4.1.50 to 8.1.1.33. My setup is TF 2.9 Cuda 11.2 and cuDNN 8.1.1.33. By the way cuDNN 8.4.1.50 says it is for Cuda 11.x, maybe that is the problem in the first place,

2 Likes

Thank you, this helped. i used cuda 11.3 and cuDNN 8.3. tf and tf-gpu 2.10.
its mostly compatibility issue that we need to figure out.

one other thing is initially it gives a warning that not enough memory allocation , after which it runs fine

thank you bro, it’s working .
i’m very happy because try one weak to identify solution

In my case it wasn’t an issue with versions but related to how the GPUs were communicating to each other. The solution was to set NCCL_SHM_DISABLE=1. By default NCCL ( NVIDIA Collective Communications Library) uses shared memory for GPU communication and for some reason this didn’t work me. By setting this environment variable to 1 you’re disabling this shared memory and instead it will use other ways to communicate such as IP sockets. I hope this helps others who were stuck with the same issue.