Recently I discovered that Tensorflow doesn’t have access to the GPU anymore. I.e. executing
tensorflow.config.list_physical_devices only includes the CPU but not the GPU. I also noticed that the CUDA version returned by
nvcc --version (Cuda compilation tools, release 10.1, V10.1.243) is older than what Tensorflow requires (11.2). Anyways, since I was gonna do a fresh install of CUDA I thought I might bump Tensorflow from 2.4.1 to 2.5.0 too to take advantage of the newest features.
When trying to install CUDA it complains about many packages that have unmet dependencies.
But from the start. I followed this guide from NVIDIA.
- pre-installation checks
lspci | grep -i nvidialists:
2d:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1) 2d:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1) 2d:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1) 2d:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
- I think my linux should be compatible. I’m on Linux Mint 20.1 which is based on Ubuntu 20.04 which is supported. Also, tensorflow used to be able to recognize the GPU on the same system before.
gcc --versiongives 9.3.0, same as in the system requirements for Ubuntu 20.04
uname -rgives 5.4.0-73-generic (the system requirements say I need 5.4.0)
- I tried to install MLNX_OFED with
sudo ./mlnxofedinstall --with-nvmf --with-nfsrdma --enable-gds --add-kernel-supportbut to no avail because I got
sudo: ./mlnxofedinstall: command not found. So I skipped over this. I don’t think I need it anyways.
- I removed the current CUDA installation with
sudo apt-get --purge remove cuda
- As suggested here, I also removed everything else connected to NVIDIA with
sudo apt-get remove --purge '^nvidia-.*'.
- I followed the installation code for Linux/x86_64/Ubuntu/20.04/deb(network) from NVIDIA and the first couple lines ran successfully:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" sudo apt-get update
- But the last line,
sudo apt-get -y install cudafailed with the error:
The following packages have unmet dependencies: cuda : Depends: cuda-11-4 (>= 11.4.0) but it is not going to be installed E: Unable to correct problems, you have held broken packages.
- As suggested by a moderator on the NVDIA forums I kept adding more packages to the list that it said it couldn’t install until I got to:
(tf_gpu) lukas@Makushin:/usr/local$ sudo apt install cuda cuda-11-4 cuda-runtime-11-4 cuda-demo-suite-11-4 cuda-drivers cuda-drivers-470 nvidia-driver-470 nvidia-settings nvidia-installer-cleanup nvidia-alternative xserver-xorg-video-nvidia-470 glx-alternative-nvidia xserver-xorg-video-nvidia-470 Reading package lists... Done Building dependency tree Reading state information... Done Package glx-alternative-nvidia is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source
- I have no clue where to get
glx-alternative-nvidiafrom and I think installing CUDA is not supposed to be that complicated. After all, I installed it just two months ago without trouble. So this is where I gave up.
Does anybody have an idea what’s going on there?
What steps do I need to follow to get my GPU working with TensorFlow?
- Linux Mint 20.1
- with a NVIDIA RTX 2080 7.5 graphics card
cross-posted from: Unix/Linux Stackexchange