GeForce GTX 1660 super , cuda not working in Anaconda

I have recently purchased 1660 super graphic card. I have installed the graphic card in my ubuntu 18.04 linux system. But i am not able to use the graphic card for my deep learning programmes . I am currently using Anaconda jupyter notebook with python 3.6, keras 2.3.1, tensorflow 2.0, tensorflow-gpu 2.0, cudnn 7.6.4, cudatoolkit 10.0.130 and nvidia driver 410.

Using above drivers and packages, i am not able to run my code the error which i am getting is :
“Failed to get convolution algorithm. This is probably because cuDNN failed to initialize” and sometimes “out of memory” also.

Please let me know how can i solve the issue specifically with respect to anaconda navigator

Hi,

As a quick smoke test, can you check that the nvidia-smi command works in the terminal?

If that works with no errors, then perhaps you’re running out of memory during your application. You can run watch -n 0.1 nvidia-smi in a separate shell while your app is running to see if the memory looks like it approaches the maximum before the error.

Hi

I ran the command in separate shell. The memory is getting full to 6gb and error also appeard simultaneously. It’s kind of sudden overshoot from 65 mb to 6 gb of memory. But how this can happen? It seems like with only 100mb of data the memory is full.

Hi,

I can’t say for sure without knowing what code you’re running, but the sudden jump I would assume is loading some dataset or model into memory, which is larger than your available 6GB and hence the OOM error. I may be able to help more if you share your scripts, but in general this doesn’t seem like a bug, just seems like your GPU doesn’t have enough memory for the task you’re trying to accomplish.

Hi,

The dataset size is small, around 163mb. Please find below code which i am trying to run with above mentioned versions of packages. The below code is working with CPU, but creates issues when run with GPU.

import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import numpy as np
import os

# batch, classes, epochs
batch_size = 32
num_classes = 10
epochs = 50

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# model architecture
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# compile the model
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

# convert to float, normalise the data
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# train 
model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True)

Hi,

I just ran your code and confirmed the model is only using about ~1GB of GPU memory. Tensorflow by default allocates almost all of the GPU memory right at the start. If you have other processes running using any GPU memory, that might make it run out.

You can set the config to dynamically grow GPU memory as needed, and this way you shouldn’t run out unless the model actually requires more than you have.

Try adding this code snippet at the top of your script.

For TF1, using NGC container “nvcr.io/nvidia/tensorflow:19.10-py3”:

root@3efd20740a2a:/mnt# python -m pip freeze | grep -i -e tensorflow -e keras
Keras==2.3.1
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0+nv
import keras
import tensorflow as tf
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)

Source: https://github.com/keras-team/keras/issues/11584#issuecomment-438052384


For TF2, using NGC container “nvcr.io/nvidia/tensorflow:19.11-tf2-py3”:

root@c03f96d089ad:/mnt# python -m pip freeze | grep -i -e tensorflow -e keras
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.0.5
tensorflow-addons==0.5.2
tensorflow-datasets==1.2.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.0.0+nv
tensorflow-metadata==0.15.0
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Source: https://github.com/tensorflow/tensorflow/issues/25138#issuecomment-559339162

Hi,

Can you please let me know, what Nvidia driver version should i use for 1660 super, cudnn and cuda for the docker TF1 or TF2. As currently i have installed the latest nvidia driver 440.3, cudnn 7.6.4 and cudatoolkit 10.0.130 ? I think this might be conflicting.

Hi gupta,

I don’t believe it will conflict, though I can’t say for sure. Can you share the commands you’re running and the corresponding full errors you’re getting?

Hi,

i did fresh installation as mentioned below :

  1. I downloaded the avaialble driver for 1660 super from nvidia website and installed as per the instructions.

  2. Then i installed cuda 10.2 as per the provided instructions on the download page.

  3. Copied the cudnn 7.6.4 files in the lib folder as mentioned on nvidia cudnn page.

  4. After this installation , i created a new environment in Anaconda navigator with python 3.6, tensorflow-gpu 2.0

  5. Tried running the code by importing keras from tensorflow as mentioned in tensorflow-gpu2.0 user guide.

But still ended in the same issue while running model.fit : “Failed to get convolution algorithm. This is probably because cuDNN failed to initialize”

I tried with the suggested add on to the start of my script, but still same error.

Please let me know if i am doing something wrong here, or some other procedure. Also let me know whether the device 1660 super really supports cuda/cudnn/tensorflow.

Hi,

i used the ngc container : nvcr.io/nvidia/tensorflow:19.11-tf2-py3"

i ran my same code, and got the exact same error :

“”"tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node sequential/conv2d/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_1075]

Function call stack:
distributed_function""""

root@980a8ddaa84e:/mnt# python -m pip freeze | grep -i -e tensorflow -e keras
Keras-Applications==1.0.8
Keras-Preprocessing==1.0.5
tensorflow-addons==0.5.2
tensorflow-datasets==1.2.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.0.0+nv
tensorflow-metadata==0.15.0
root@980a8ddaa84e:/mnt#

root@980a8ddaa84e:/Documents/bosch_data# nvidia-smi
Mon Dec 16 16:46:47 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166… On | 00000000:01:00.0 On | N/A |
| 0% 45C P8 11W / 125W | 424MiB / 5941MiB | 1% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|

hi, Thanks for the help.

Finally worked—with below steps :

  1. removed all the nvidia drivers : sudo apt-get remove --purge nvidia*
  2. removed all the cuda versions : sudo apt-get remove --purge cuda*
  3. manually deleted the cuda folders from : /usr/local
  4. pc reboot
  5. downloaded the driver from nvidia : https://www.nvidia.in/Download/driverResults.aspx/156094/en-in
    version : 440.44
  6. installed the driver, did system restart.
  7. Tested driver with : nvidia-smi
  8. Added the repositories :

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

  1. Then installed cuda with command :

sudo apt-get install --no-install-recommends cuda-10-0

  1. downloaded the cudnn version and samples : cudnn-10.0-linux-x64-v7.6.5.32.tgz, libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64

  2. copied files using the instructions provided : https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

  3. Added following path in bashrc :

export CUDA_ROOT=/usr/local/cuda
export PATH=/usr/local/cuda-10.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64/
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/x86_64-linux/lib

  1. pc reboot

  2. tested the cudnn

15 test passed successfully

16 downloaded the container : 19.11-tf1-py3

17.ran the container

  1. Below are the packages installed :

Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.0.5
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0+nv

  1. ran the same code.

worked perfectly!!