I’m trying to run “$ python cifar10_train.py” in a tensorflow env in anaconda2.
Getting this output:
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:99] Couldn’t open CUDA library libcudnn.so. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1562] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
Segmentation fault (core dumped)
$ nvidia-smi
Sun May 15 23:45:12 2016
±-----------------------------------------------------+
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 0000:01:00.0 On | N/A |
| 40% 29C P8 1W / 38W | 157MiB / 2047MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 915 G /usr/lib/xorg/Xorg 108MiB |
| 0 1327 G compiz 37MiB |
±----------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
$ ls -la /usr/local/cuda/lib64/libcudnn*
lrwxrwxrwx 1 3319 users 13 Feb 9 12:48 libcudnn.so → libcudnn.so.4
lrwxrwxrwx 1 3319 users 17 Feb 9 12:48 libcudnn.so.4 → libcudnn.so.4.0.7
-rwxrwxr-x 1 3319 users 61453024 Feb 8 17:12 libcudnn.so.4.0.7
-rw-rw-r-- 1 3319 users 62025862 Feb 8 17:12 libcudnn_static.a
makes things work great on the command line, but I’m not sure how to get TensorFlow (during setup via ./configure) to use this version not installed in /usr/local/cuda.
Anybody have thoughts on this? I’m tempted to create some symlinks, but that seems like a hack. Any thoughts appreciated!
And looks like rockpereira has the same install versions I have (per nvidia-smi and nvcc -V) so it’s possible that I’ll just be hit with the same problems down the road. Might be a red herring.
I was able to get TensorFlow (from source) built and compiled with the Xenial Ubuntu repos. The biggest issue I had was that TensorFlow expected all the CUDA and cuDNN libraries and headers to be installed under /usr/local/cuda. To deal with that I essentially had to create temporarily symlinks, which was ugly. The positive was that I could use CUDA from the Xenial repo, cuDNN downloaded from NVIDIA and the standard gcc (5.3.1) and they played nice together. For the record I did the following:
sudo apt-get install nvidia-cuda-toolkit
sudo apt-get install nvidia-cuda-361-updates
sudo apt-get install nvidia-nsight
sudo apt-get install nvidia-profiler
sudo apt-get install libcupti-dev zlib1g-dev
# Put symlinks in /usr/local/cuda
sudo mkdir /usr/local/cuda
cd /usr/local/cuda
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
sudo ln -s /usr/bin/ bin
sudo ln -s /usr/lib/x86_64-linux-gnu/ nvvm
sudo mkdir -p extras/CUPTI
cd extras/CUPTI
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
# Install cudann
#http://askubuntu.com/questions/767269/how-can-i-install-cudnn-on-ubuntu-16-04
# Download cudann as detailed above and extract
cd ~/Downloads/cuda
sudo cp include/cudnn.h /usr/include
sudo cp lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*
# ... Install TensorFlow from source ...
(tf3) rock@ubuntu:~/anaconda2/envs/tf3/lib/python3.5/site-packages/tensorflow/models/image/cifar10 $ python cifar10_train.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.2545
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.79GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) → (device: 0, name: GeForce GTX 750 Ti$
2016-05-16 21:58:10.281520: step 0, loss = 4.68 (6.3 examples/sec; 20.168 sec/batch)
2016-05-16 21:58:13.245997: step 10, loss = 4.66 (616.7 examples/sec; 0.208 sec/batch)
2016-05-16 21:58:15.522023: step 20, loss = 4.64 (631.4 examples/sec; 0.203 sec/batch)
2016-05-16 21:58:17.588216: step 30, loss = 4.62 (622.5 examples/sec; 0.206 sec/batch)
2016-05-16 21:58:19.709766: step 40, loss = 4.60 (587.4 examples/sec; 0.218 sec/batch)
Chiming in to say that this also worked for me. Not the “cleanest” setup (apt regularly notifies me that /usr/lib/x86_64-linux-gnu/libcudnn.so.5 is not a symbolic link), but…it works! You definitely saved me some time, thank you for that.
I’m using cuDNN 5.0.5 deb packages from NVIDIA’s download site (installed with “sudo dpkg -i libcudnn5*.deb”), and the Ubuntu repo versions of the rest of the cuda toolkit / libraries. Using this approach, I didn’t have to copy over the cudnn files into the system directories (the deb puts them there for you), but I did use the above symlink approach to make it look like everything is installed in /usr/local/cuda. This worked great for the “./configure” step, thanks!