[Solved] Tensorflow 1.14 - Cuda 10.0 - GTX 970 - Ubuntu 18.04

Hi,

I would like some guidance on the driver, toolkit, cudnn versions to install, I’m about 3 days in a row installing, testing, failing and reinstalling from scratch ubuntu over and over without success, maybe 7 times in a day. I’ve read so many different websites and forums about it I can’t figure it out.

Main goal is to use tensorflow gpu finetunning for AI models:

This gpu GTX 970 is 4gb only but I’ll tweak a little the code to enable its usage, I think it’s better than nothing.

I only did it successfully one time on ubuntu 18.04.5 but I can’t recreate it anymore, I’m lost with so many compatibility issues.

Since tensorflow 1.14 supports only:
Cuda 10.0
Cudnn 7.4

tensorflow gpu - compatibility table

the issue comes in when installing the nvidia driver wich there are only these ones available:

vendor : NVIDIA Corporation
model : GM204 [GeForce GTX 970]
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-415 - third-party free
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-460 - third-party free recommended
driver : xserver-xorg-video-nouveau - distro free builtin

I’ve been doing this, maybe you can spot something I’m not:

sudo apt update
sudo apt -y install build-essential

sudo apt update
sudo apt -y install linux-headers-$(uname -r)

sudo apt -y install nvidia-driver-390

wget -nc https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt update
sudo apt -y install cuda-10-0
echo "export PATH=\"/usr/local/cuda-10.0/bin:\$PATH\"" >> ~/.bashrc

wget -nc http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.4.1.5-1+cuda10.0_amd64.deb
wget -nc http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7_7.4.1.5-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64.deb
sudo apt update
sudo apt -y install libcudnn7=7.4.1.5-1-1+cuda10.0 libcudnn7-dev=7.4.1.5-1+cuda10.0

# https://developer.nvidia.com/rdp/cudnn-download
tar axvf cudnn-10.0-linux-x64-v7.6.0.64.tgz
sudo mv cuda/include/cudnn.h /usr/local/cuda/include
sudo mv cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
rm -rf cuda

sudo ldconfig

I’ve tried also this approach:

sudo apt update
sudo apt -y install build-essential

sudo apt update
sudo apt -y install linux-headers-$(uname -r)

sudo apt -y install nvidia-driver-450

wget -nc https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt update
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb

sudo apt update
sudo apt -y install cuda-10-0

wget -nc https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo dpkg -i nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt update
sudo apt -y install libcudnn7 libcudnn7-dev libnccl2 libc-ares-dev

sudo apt autoremove
sudo apt upgrade

sudo mkdir -p /usr/local/cuda-10.0/nccl/lib
sudo ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/nccl/lib/
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/local/cuda-10.0/lib64/

sudo reboot

If you know some workaround to get it done let me know please!

Keywords:
GTX 970 nvidia driver = 450
Ubuntu = 18.04.5
tensorflow-gpu = 1.14
Required Cuda = 10.0
Required Cudnn = 7.4

Solution:
It required a downgrade to Ubuntu 18.04.1
Toolkit_10.0_Installation_Guide
since kernel was not supported, reading the whole document helped a lot.

Here is the code run:

#start
sudo apt-get -y install linux-headers-$(uname -r)
sudo apt-get update
sudo apt-get -y install gcc-4.8-base
sudo apt-get -y install build-essential

wget -nc https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

echo “export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}”
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
echo “export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}”
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
/usr/bin/nvidia-persistenced --verbose
cat /proc/driver/nvidia/version
sudo apt-get -y install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
sudo reboot
#end

Wed Jan 27 23:45:43 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 Off | 00000000:01:00.0 On | N/A |
| 49% 30C P8 15W / 250W | 152MiB / 4039MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 962 G /usr/lib/xorg/Xorg 76MiB |
| 0 1180 G /usr/bin/gnome-shell 69MiB |
| 0 2169 G /usr/lib/firefox/firefox 3MiB |
±----------------------------------------------------------------------------+

Thanks.