The Nvidia support matrix indicates that I need CUDA 10.0 with driver 410 to run on my Titan RTX (Turning) card for my target versions of Tensorflow want CUDA 10.0. Alternately, I have also tried CUDA 10.1 with driver 418. Both on Ubuntu 18.04.
-
I have found that the debian network installations for 10.0 and 10.1 are actually installing 10.2. It looks like someone packaged the wrong contents for
https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork
to install CUDA 10.0 with the 410 drivers that are supposedly packaged with the release. Same for 10.1 with CUDA Toolkit 10.1 update2 Archive | NVIDIA Developer -
I have tried to manually download the driver for 10.1Update2 which should be 418.87. However, the Advance NVidia Driver page shows that the Titan RTX never had a release 418.87, but it did have 418.56, 418.74, 418.88, and 418.113. This reference is here, but the list does not show up on a MacOS, but does on Ubuntu which is problematic since I often use my Mac for research separate from my Ubuntu platform.
https://www.nvidia.com/Download/Find.aspx?lang=en
. The initial error in /var/log/nvidia-installer.log told me “Unable to load: nvidia-installer ncurses v6 user interface.” I did some research and it said they said to remove the file /usr/lib/nvidia/pre-install as it was guaranteed to fail with an exit 1 condition.https://askubuntu.com/questions/798932/how-can-i-fix-unable-to-load-nvidia-installer-ncurses-v6-user-interface
That lead to the next failure which said that I need to uninstall the current nouveau driver before installing the new one. I followed the instructions here, but it looks like they may want a reboot and I might have to do ssh into my machine to change the driver since I then would not have any drivers installed.https://linuxconfig.org/how-to-disable-nouveau-nvidia-driver-on-ubuntu-18-04-bionic-beaver-linux
. I’m having a very hard time believing this is expected of an end user to deal with these types of packaging issues. I can’t imaging that I’m going down a fruitful path.
More background below:
The installation actually installs the CUDA 10.2 tools into the /usr/local/cuda-10.2 directory. For example, I have /usr/local/cuda-10.2/lib64/libcudart.so.10.2. Tensorflow errors outs because it is looking for libcudart.so.10.0.
I also tried to install the CUDA 10.1.update2 for Linux x86_64 Ubuntu 18.04 with the deb(network) option and it also installed the CUDA 10.2 tools. It appears that someone has packaged the CUDA 10.2 tools inside CUDA 10.0.
I tried to install the CUDA 10.1.update2 for Linux x86_64 Ubuntu 18.04 with the runfile (local). According to the name, it has the driver version 418 included in the page. I got install errors in /var/log/cuda-installer.log that say “The distribution provided pre-install script failed…” [See attached photo]. The /var/log/nvidia-installer.log says “The distribution provided pre-install script failed…” It goes on to talk about disabling the Nouveau kernel driver, before preceding, but everything I have read says that is a step that occurs after the new driver has been successfully installed.
My Ubuntu Software & Updates show that I’m currently on Nouveau and nvidia-driver-435 (proprietary, tested) and nvidia-driver-430 (proprietary) are the standard options. It doesn’t look like either of these will work as it looks like I need drivers 410.48 or 418.87.00 according to the CUDA packaging.
I have tried various approached to get the 410.48 or 418.87.00 drivers:
-
Other install instructions
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-driver-410
-
Tensorflow documentation from
https://www.tensorflow.org/install/gpu
sudo apt-get install --no-install-recommends nvidia-418
The result is that is upgrades the driver during installation to 430.50 which is incompatible with CUDA 10.0 or CUDA 10.1. The exact versions of the drivers are not on your driver archive page, either: https://www.nvidia.com/en-us/drivers/unix/
or https://www.nvidia.com/en-us/drivers/unix/linux-amd64-display-archive/
I would like to get a working version of CUDA 10.0 (driver 410.48) or CUDA 10.1 update2 (driver 418.87.00). I’m at a total loss on how to proceed as I’ve tried all four installation options on your page for CUDA 10.1 and all four options for CUDA 10.1.update2 plus other variants. I’ve reinstalled my Ubuntu18.04 each time to have a clean starting point.
Do you support 410.48 and 418.87.00 for Titan RTX? I decided on them over the RTX 2080 Tis because I wanted extra memory to run large models and they were both Turing. There are people with RTX 2080 Tis that are able to get installed. My hardware config is very similar to a 2 workstation config by Lambda Labs, so it seems like this should be doable.
Thanks.