I am relatively new to ML and am getting the below error in the ./mnistCUDNN test:
Error: CUDNN_STATUS_NOT_INITIALIZED
I have just upgraded to Linux 17.10. Can someone tell me if the error might be due to a problem with my stack?
Linux 17.10
NVIDIA Titan Xp
NVIDIA driver 384
CUDA 9.0
cuDNN 7.0
Not sure what you mean by “I have just upgraded to Linux 17.10”
If by that you mean:
- I had all this working on a previous OS version.
- then I went into that setup and upgraded to 17.10, without reinstalling anything else
Then yes, you probably have broken your stack. Perform basic CUDA verification steps. Not sure what those are? Refer to the linux install guide. also run nvidia-smi and make sure it gives sane output.
Thank you for your reply.
Actually, I have rebuilt from the ground up (everything is freshly installed from downloads and repositories). I have encountered fewer installation issues with Ubuntu 17.10 than when starting from Ubuntu 16.4, so I would like to stay with it if possible, but during the cuDNN testing, I am encountering the problem detailed in the original posting. My question is if the stack ought to work and, if so, if you might be able to tell me where the problem arises?
Thank you!
17.10 is not an officially supported distro for CUDA at this time. (read the linux install guide)
did you verify the CUDA install?
Here are the results:
Tue Mar 6 15:09:22 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:01:00.0 On | N/A |
| 23% 35C P5 16W / 250W | 205MiB / 12188MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 877 G /usr/lib/xorg/Xorg 15MiB |
| 0 934 G /usr/bin/gnome-shell 50MiB |
| 0 1199 G /usr/lib/xorg/Xorg 71MiB |
| 0 1342 G /usr/bin/gnome-shell 64MiB |
±----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
I am not certain how to read these, but they seem to be consistent with what I have found on the internet as indicative of a “successful installation.”
Perform basic CUDA verification steps. Not sure what those are? Refer to the linux install guide.
Please see the results above.
please read the verification section in the linux install guide.
[url]Installation Guide Linux :: CUDA Toolkit Documentation
Thank you for your reply. I have reinstalled everything. I was following the instructions for 9.1. Please note 9.0 does not appear to have deviceQuery, so I was unable to perform that check, but the NVIDIA_CUDA-9.0_Samples check out. For instance the result from 0_Simple/simplePitchLinearTexture is:
simplePitchLinearTexture starting…
GPU Device 0: “TITAN Xp” with compute capability 6.1
Bandwidth (GB/s) for pitch linear: 3.84e+02; for array: 3.86e+02
Texture fetch rate (Mpix/s) for pitch linear: 4.80e+04; for array: 4.82e+04
simplePitchLinearTexture completed, returned OK
I reinstalled cuDNN after the above tests were performed and the mnist test still fails:
cudnnGetVersion() : 7101 , CUDNN_VERSION from cudnn.h : 7101 (7.1.1)
Host compiler version : GCC 6.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 30 Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 12188, MemClock 5705.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
CUDNN failure
Error: CUDNN_STATUS_NOT_INITIALIZED
mnistCUDNN.cpp:394
Aborting…
OK so CUDA appears to be working in your setup.
gcc 6.4.0 is not a supported host compiler version for CUDA at this time. However I don’t know for sure that is the problem here. I probably wouldn’t be able to say what is wrong without setting up your exact test case.
I have downgraded to gcc-5 and get (perhaps) one-half step further.
When I try to compile mnistCUDNN, I now have an error with FreeImage (below). As far as I can tell, I have FreeImage (both base and dev) installed correctly. These are tested with minimal programs off of the web. I do not see a specific verification.
g++: No such file or directory
WARNING - FreeImage is not set up correctly. Please ensure FreeImage is set up correctly. <<<
Does this offer any insight so that you would be able to tell me next steps? Alternately, is there a recommended stack (based in Ubuntu) for the Titan Xp that you can recommend? Ultimately, I will be installing Tensorflow.
Thank you!
To get a well-curated TF stack set up rapidly, I would use NGC:
[url]NGC Documentation
There are instructions for setup of NGC on your own system here:
[url]Using NGC with Your NVIDIA TITAN or Quadro PC Setup Guide :: NVIDIA GPU Cloud Documentation
I am trying to get cudnn running. I am on ubuntu 16.04 with a GTX 950 card. I have installed and tested cuda-8.0 (recommended for now for tensorflow), and I have installed cudnn from the deb files
libcudnn7_7.1.2.21-1+cuda8.0_amd64.deb
libcudnn7-dev_7.1.2.21-1+cuda8.0_amd64.deb
libcudnn7-doc_7.1.2.21-1+cuda8.0_amd64.deb
When I try to run mnistCUDNN, I get this:
cudnnGetVersion() : 7102 , CUDNN_VERSION from cudnn.h : 7102 (7.1.2)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 6 Capabilities 5.2, SmClock 1392.5 Mhz, MemSize (Mb) 1995, MemClock 3305.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
CUDNN failure
Error: CUDNN_STATUS_NOT_INITIALIZED
mnistCUDNN.cpp:394
Aborting…
what does it mean that the status is not initialized?
Thanks,
Art Edwards
Never mind:
The deb package had not installed correctly. When I installed from the .tgz file, everything worked.
Art Edwards