ERROR: Cannot create cuDNN handle. cuDNN won't be available

Hi Everybody,

I have two problems with DIGITS. (note: I have installed only CAFFE-0.15, not tensorflow)

Problem 1) When I run "./digits-devserver" on localhost:5000, I get these errors:

[WARNING] Failed to load 2 jobs.
[DEBUG] 20180424-202847-b6fe - IOError: [Errno 2] No such file or directory: ‘/home/deep/digits/digits/jobs/20180424-202847-b6fe/status.pickle’
[DEBUG] 20180424-203035-3b6a - IOError: [Errno 2] No such file or directory: ‘/home/deep/digits/digits/jobs/20180424-203035-3b6a/status.pickle’

Problem 2) When I train a model, I get this ouput (error):

[DEBUG] Network sanity check - train
[DEBUG] Network sanity check - val
[DEBUG] Network sanity check - deploy
[INFO ] Train Caffe Model task started.
[INFO ] Task subprocess args: “/home/deep/caffe/build/tools/caffe train --solver=/home/deep/digits/digits/jobs/20180424-213359-b64a/solver.prototxt --gpu=0”
[ERROR] Train Caffe Model: Cannot create cuDNN handle. cuDNN won’t be available.
[ERROR] Train Caffe Model: Cannot create cuDNN handle. cuDNN won’t be available.
[ERROR] Train Caffe Model: Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
[ERROR] Train Caffe Model: Cannot create cuDNN handle. cuDNN won’t be available.
[ERROR] Train Caffe Model task failed with error code -6

  • My system settings as follows:
  • ~/.bashrc includes as follows:
    export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    export CAFFE_ROOT=~/caffe
    export DIGITS_ROOT=~/digits
    export PYTHONPATH=/home/deep/caffe/python:${PYTHONPATH:+:${PYTHONPATH}}
    
    $ nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.111                Driver Version: 384.111                   |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On   |                  N/A |
    | N/A   48C    P2    30W /  N/A |    619MiB /  8111MiB   |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0      1044      G   /usr/lib/xorg/Xorg                           344MiB |
    |    0      1828      G   compiz                                       150MiB |
    |    0     14169      G   ...-token=17443F43231739145F843E03090635BC   122MiB |
    +-----------------------------------------------------------------------------+
    
    $ dpkg -l | egrep 'digits|caffe|libcudnn|libnccl|cudart|nvidia'
    
    ii  cuda-cudart-9-0                             9.0.176-1                                    amd64        CUDA Runtime native Libraries
    ii  cuda-cudart-dev-9-0                         9.0.176-1                                    amd64        CUDA Runtime native dev links, headers
    ii  libcudnn7                                   7.1.3.16-1+cuda9.1                           amd64        cuDNN runtime libraries
    ii  libcudnn7-dev                               7.1.3.16-1+cuda9.1                           amd64        cuDNN development libraries and headers
    ii  libcudnn7-doc                               7.0.5.15-1+cuda9.0                           amd64        cuDNN documents and samples
    ii  nvidia-384                                  384.111-0ubuntu1                             amd64        NVIDIA binary driver - version 384.111
    ii  nvidia-384-dev                              384.111-0ubuntu1                             amd64        NVIDIA binary Xorg driver development files
    ii  nvidia-machine-learning-repo-ubuntu1604     1.0.0-1                                      amd64        nvidia-machine-learning repository configuration files
    ii  nvidia-modprobe                             390.30-0ubuntu1                              amd64        Load the NVIDIA kernel driver and create device files
    ii  nvidia-opencl-icd-384                       384.111-0ubuntu1                             amd64        NVIDIA OpenCL ICD
    ii  nvidia-prime                                0.8.2                                        amd64        Tools to enable NVIDIA's Prime
    ii  nvidia-settings                             390.30-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver
    

    Edit for Problem 1: If I delete the jobs(datasets, models and etc.) before quiting the digits server, when I re-run ./digits-devserver, problem 1 dissappeares. However, I guess it is not a solution.

    I solved Problem 1 by installing tensorflow 1.7. However, problem 2 remains same. I cannot achieve yet. Isn’t there anybody to help me about this problem? Moderators were used to help without hesitating. Unfortunately, they do not help anymore.

    Problem 2 was also solved. I have installed wrong cuDNN version (see above) which is not compatible with cuda-9.0. I removed cuDNN and installed right version of it.