GeForce GTX 1660 super , cuda not working in Anaconda

gupta.nikhil0126 · December 8, 2019, 8:23am

I have recently purchased 1660 super graphic card. I have installed the graphic card in my ubuntu 18.04 linux system. But i am not able to use the graphic card for my deep learning programmes . I am currently using Anaconda jupyter notebook with python 3.6, keras 2.3.1, tensorflow 2.0, tensorflow-gpu 2.0, cudnn 7.6.4, cudatoolkit 10.0.130 and nvidia driver 410.

Using above drivers and packages, i am not able to run my code the error which i am getting is :
“Failed to get convolution algorithm. This is probably because cuDNN failed to initialize” and sometimes “out of memory” also.

Please let me know how can i solve the issue specifically with respect to anaconda navigator

NVES_R · December 11, 2019, 1:28am

Hi,

As a quick smoke test, can you check that the nvidia-smi command works in the terminal?

If that works with no errors, then perhaps you’re running out of memory during your application. You can run watch -n 0.1 nvidia-smi in a separate shell while your app is running to see if the memory looks like it approaches the maximum before the error.

gupta.nikhil0126 · December 11, 2019, 8:33am

Hi

I ran the command in separate shell. The memory is getting full to 6gb and error also appeard simultaneously. It’s kind of sudden overshoot from 65 mb to 6 gb of memory. But how this can happen? It seems like with only 100mb of data the memory is full.

NVES_R · December 11, 2019, 8:56pm

Hi,

I can’t say for sure without knowing what code you’re running, but the sudden jump I would assume is loading some dataset or model into memory, which is larger than your available 6GB and hence the OOM error. I may be able to help more if you share your scripts, but in general this doesn’t seem like a bug, just seems like your GPU doesn’t have enough memory for the task you’re trying to accomplish.

gupta.nikhil0126 · December 12, 2019, 6:53pm

Hi,

The dataset size is small, around 163mb. Please find below code which i am trying to run with above mentioned versions of packages. The below code is working with CPU, but creates issues when run with GPU.

import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import numpy as np
import os

# batch, classes, epochs
batch_size = 32
num_classes = 10
epochs = 50

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# model architecture
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# compile the model
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

# convert to float, normalise the data
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# train 
model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True)

NVES_R · December 12, 2019, 7:31pm

Hi,

I just ran your code and confirmed the model is only using about ~1GB of GPU memory. Tensorflow by default allocates almost all of the GPU memory right at the start. If you have other processes running using any GPU memory, that might make it run out.

You can set the config to dynamically grow GPU memory as needed, and this way you shouldn’t run out unless the model actually requires more than you have.

Try adding this code snippet at the top of your script.

For TF1, using NGC container “nvcr.io/nvidia/tensorflow:19.10-py3”:

root@3efd20740a2a:/mnt# python -m pip freeze | grep -i -e tensorflow -e keras
Keras==2.3.1
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0+nv

import keras
import tensorflow as tf
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)

Source: Allowing GPU memory growth command does not work · Issue #11584 · keras-team/keras · GitHub

For TF2, using NGC container “nvcr.io/nvidia/tensorflow:19.11-tf2-py3”:

root@c03f96d089ad:/mnt# python -m pip freeze | grep -i -e tensorflow -e keras
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.0.5
tensorflow-addons==0.5.2
tensorflow-datasets==1.2.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.0.0+nv
tensorflow-metadata==0.15.0

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Source: Tensorflow v2 Limit GPU Memory usage · Issue #25138 · tensorflow/tensorflow · GitHub

gupta.nikhil0126 · December 13, 2019, 3:20am

Hi,

Can you please let me know, what Nvidia driver version should i use for 1660 super, cudnn and cuda for the docker TF1 or TF2. As currently i have installed the latest nvidia driver 440.3, cudnn 7.6.4 and cudatoolkit 10.0.130 ? I think this might be conflicting.

NVES_R · December 13, 2019, 5:18am

Hi gupta,

I don’t believe it will conflict, though I can’t say for sure. Can you share the commands you’re running and the corresponding full errors you’re getting?

gupta.nikhil0126 · December 14, 2019, 6:42am

Hi,

i did fresh installation as mentioned below :

I downloaded the avaialble driver for 1660 super from nvidia website and installed as per the instructions.
Then i installed cuda 10.2 as per the provided instructions on the download page.
Copied the cudnn 7.6.4 files in the lib folder as mentioned on nvidia cudnn page.
After this installation , i created a new environment in Anaconda navigator with python 3.6, tensorflow-gpu 2.0
Tried running the code by importing keras from tensorflow as mentioned in tensorflow-gpu2.0 user guide.

But still ended in the same issue while running model.fit : “Failed to get convolution algorithm. This is probably because cuDNN failed to initialize”

I tried with the suggested add on to the start of my script, but still same error.

Please let me know if i am doing something wrong here, or some other procedure. Also let me know whether the device 1660 super really supports cuda/cudnn/tensorflow.

gupta.nikhil0126 · December 16, 2019, 4:53pm

Hi,

i used the ngc container : nvcr.io/nvidia/tensorflow:19.11-tf2-py3"

i ran my same code, and got the exact same error :

“”"tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node sequential/conv2d/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_1075]

Function call stack:
distributed_function"“”"

root@980a8ddaa84e:/mnt# python -m pip freeze | grep -i -e tensorflow -e keras
Keras-Applications==1.0.8
Keras-Preprocessing==1.0.5
tensorflow-addons==0.5.2
tensorflow-datasets==1.2.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.0.0+nv
tensorflow-metadata==0.15.0
root@980a8ddaa84e:/mnt#

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|

gupta.nikhil0126 · December 17, 2019, 8:29am

hi, Thanks for the help.

Finally worked—with below steps :

removed all the nvidia drivers : sudo apt-get remove --purge nvidia*
removed all the cuda versions : sudo apt-get remove --purge cuda*
manually deleted the cuda folders from : /usr/local
pc reboot
downloaded the driver from nvidia : Linux x64 (AMD64/EM64T) Display Driver | 440.44 | Linux 64-bit | NVIDIA
version : 440.44
installed the driver, did system restart.
Tested driver with : nvidia-smi
Added the repositories :

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

Then installed cuda with command :

sudo apt-get install --no-install-recommends cuda-10-0

downloaded the cudnn version and samples : cudnn-10.0-linux-x64-v7.6.5.32.tgz, libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64
copied files using the instructions provided : Installation Guide :: NVIDIA Deep Learning cuDNN Documentation
Added following path in bashrc :

export CUDA_ROOT=/usr/local/cuda
export PATH=/usr/local/cuda-10.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64/
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/x86_64-linux/lib

pc reboot
tested the cudnn

15 test passed successfully

16 downloaded the container : 19.11-tf1-py3

17.ran the container

Below are the packages installed :

Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.0.5
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0+nv

ran the same code.

worked perfectly!!

Topic		Replies	Views
Windows 10: R-Studio+R 3.5.1+Tensorflow+Python 3.6- Convolution Neural Network Error while fitting the model CUDA Setup and Installation	2	748	August 12, 2018
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	6	4140	December 29, 2019
GTX 1660 Ti - Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	6	1606	March 12, 2020
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Frameworks (archived) tensorflow	1	1409	May 18, 2020
Suggest cudnn for my laptop cuDNN	1	472	February 18, 2021
cuDNN failed to initialize cuDNN	2	1604	September 16, 2019
Failed to initialize GPU device #0: unknown error cuDNN	0	2262	April 23, 2019
CuDNN error while fitting CNN cuDNN	2	3555	May 17, 2020
NVCaffe training out of memory GPU-Accelerated Libraries	3	805	December 21, 2017
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51772	October 12, 2021

GeForce GTX 1660 super , cuda not working in Anaconda

Related topics