Use gpu for tensorflow, crashes

jimmaasuk · March 19, 2022, 9:12am

New to gpu computing, have installed a gtx 1660ti with compute capability of 7.5. I’m using on ubuntu 21.10, have set up nvidia-toolkit, cudnn, tensorflow, tensorflow-gpu in a conda env, all appears to work fine: 1 gpu visible, built with cudnn 11.6.r11.6, tf version 2.8.0, python version 3.7.10 all in conda env running on a jupyter notebook. All seems to run fine until I attempt to train a model and then I get this error message:

2022-03-19 04:42:48.005029: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8302

and then the kernel just locks up and crashes. BTW the code worked prior to installing gpu, when it simply used cpu. Is this simply a version mismatch somewhere between python, tensorflow, tensorflow-gpu, cudnn versions or something more sinister? Thx. J.

bartosz.m.prokop · April 7, 2022, 1:50pm

Hi, I have the exactly same problem, I am using tensorflow 2.8.0 with CudNN 11.2 and python 3.7.6 with one GPU visible. The code also worked before I set up the Toolkits and and environment variables, but now it crashes after the same message.

2022󈚨󈚫 15:45:05.860344: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400

I would really appreciate if some one could help out with this issue, or maybe if @jimmaasuk you have found a solution for this problem,

Best wishes,
Bartosz

1319885580 · April 9, 2022, 12:59am

me too , i would appreciate it vary much

benjamin.colbeck · April 10, 2022, 12:14am

Same exact problem here, running on tf 2.8 and python 3.9 in PyCharm with a Conda env. Would very much appreciate some help with this.

ma7226087 · April 23, 2022, 5:24am

I have the same problem, the environment is py3.8.10 tf2.8 keras 2.8 output log: Epoch 1 / 30 2022-04-23 11:25:22.138221: I tensorflow/stream_executor/cuda/cuda_dnn.cc: 368] Loaded cuDNN version 8400,

tamvakis · April 25, 2022, 7:35am

Hallo,
I had the same issue, although I do not know how to fix the problem, I have downgraded to tensorflow and keras 2.4.0 and to CUDA version 11.0 and cuDNN 8.0 and now I can finally work without issues.
Best luck

lx6636 · May 3, 2022, 3:25pm

me too.
same issue.
GTX1060, tf 2.8( using V1 API) , python 3.9 , CUDA11.6, cuDnn_windows_8.4.0.27

the code worked well with CPU, untill cuDnn was installed.

2022-05-03 23:36:26.225210: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8400

Process finished with exit code -1073740791 (0xC0000409)

lx6636 · May 4, 2022, 7:40am

My issue was resolved.

Nvidia driver, cuDnn version, python, CUDA…… everything is ok.

Root cause: a wrong version tensorflow(only for CPU) was installed in pycharm. I removed it, and installed tensorflow-gpu version

danielya · June 9, 2022, 7:30am

Hello,
I’m experiencing the very same issue right now, same GPU model.

@lx6636 your resolution didn’t help me :(

@jimmaasuk (or anyone else in this post) - did any of you find a solution for that??

danielya · June 9, 2022, 10:19am

Resolved by installing Zlib and add it to path

rusandrkozak · June 24, 2022, 9:34am

Hello, you can try this:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

Tensorflow was slower, but it will working.

samaster · July 15, 2022, 9:22pm

I had the same issue. It was gone, after I made sure that I had TF-gpu installed and then I downgraded the cuDNN files from 8.4.1.50 to 8.1.1.33. My setup is TF 2.9 Cuda 11.2 and cuDNN 8.1.1.33. By the way cuDNN 8.4.1.50 says it is for Cuda 11.x, maybe that is the problem in the first place,

vineeth.p2000 · September 28, 2022, 6:49am

Thank you, this helped. i used cuda 11.3 and cuDNN 8.3. tf and tf-gpu 2.10.
its mostly compatibility issue that we need to figure out.

one other thing is initially it gives a warning that not enough memory allocation , after which it runs fine

gokulakrishnanpackiam · November 11, 2022, 2:33am

thank you bro, it’s working .
i’m very happy because try one weak to identify solution

sandrewge · March 7, 2024, 11:36am

In my case it wasn’t an issue with versions but related to how the GPUs were communicating to each other. The solution was to set NCCL_SHM_DISABLE=1. By default NCCL ( NVIDIA Collective Communications Library) uses shared memory for GPU communication and for some reason this didn’t work me. By setting this environment variable to 1 you’re disabling this shared memory and instead it will use other ways to communicate such as IP sockets. I hope this helps others who were stuck with the same issue.

Topic		Replies	Views
Tensorflow not working on geforce 3090 Frameworks cuda , tensorflow , ubuntu , drive-cuda	3	2744	March 15, 2021
Need help with Cuda 9.0 /cuDNN 7.0.5 on TensorFlow 1.5 - CUDA_ERROR_UNKNOWN CUDA Setup and Installation	2	3348	March 1, 2018
CUDA and GPU compatibility - GEFORCE GTX TI GPU cuDNN	1	2534	January 31, 2020
Python crashes after cudnn update cuDNN	13	4046	May 17, 2022
Does the latest GTX 1660 model support cuda? CUDA Setup and Installation	16	66248	October 1, 2023
Issues with Tensorflow on CUDA10 and RTX2080 CUDA Setup and Installation	3	4363	March 6, 2019
Tensorflow is not recognising the gpu TensorRT	7	2071	July 15, 2024
Trying to get tensorflow to work with my GPU CUDA Setup and Installation	5	3394	March 27, 2024
[Solved] TensorFlow with GPU in Anaconda env [Ubuntu 16.04 + CUDA 7.5 + cuDNN] CUDA Setup and Installation	2	44633	May 24, 2016
[Solved] Tensorflow 1.14 - Cuda 10.0 - GTX 970 - Ubuntu 18.04 CUDA Setup and Installation cuda , tensorflow , ubuntu	0	2665	January 27, 2021

Use gpu for tensorflow, crashes

Related topics