nvidia driver failing to load after upgrade from Cuda 8 to Cuda 9.2

jameshammerton · May 30, 2018, 10:29am

In order to use tensorflow >=1.5.0, I tried to upgrade my Ubuntu 16.04 server with 2 GTX 1070 GPUs from Cuda 8 to Cuda 9.

On my first attempt, I used the local .deb installer for 9.1 but after installation, when I tried nvidia-smi it complained:

“NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”.

I tried to download and reinstall the driver manually at this stage, but the .run file aborted saying the preinstall failed.

Today, I returned to the problem - this time installing 9.2 (using network installer) after explicitly removing the old cuda 8 install (which I realised I’d forgotten to do first time) and of course the attempted 9.1 install. However the same problem occurs - nvidia-smi reports the same problem.

Note that ‘lspci | grep -i nvidia’ confirms the GPUs are there. Also I get the following error after ‘sudo /sbin/modprobe nvidia’:

“modprobe: ERROR: could not insert ‘nvidia_396’: Exec format error”

I haven’t tried reinstalling the driver - is that what I need to do next?

Andrey1984 · May 30, 2018, 11:42am

you may need to follow the driver uninstall procedures from the documentation

Andrey1984 · May 30, 2018, 12:07pm

[url]Installation Guide Linux :: CUDA Toolkit Documentation

jameshammerton · May 30, 2018, 12:46pm

Thanks for the reply, however /usr/bin/nvidia-uninstall does not exist.

Andrey1984 · May 30, 2018, 1:09pm

Use the following command to uninstall a Toolkit runfile installation:

$ sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl

Robert_Crovella · May 30, 2018, 1:26pm

/usr/bin/nvidia-uninstall will not exist for a network deb install. Likewise the previously mentioned perl script for toolkit uninstall will also not exist for a network deb install. This is already indicated in the supplied instructions.

Andrey1984 · May 30, 2018, 1:30pm

:) thank you for the explanation.

Andrey1984 · May 30, 2018, 1:33pm

then the only below execution will be available to remove the previously installed package, as it seems to me:

sudo apt-get --purge remove <package_name>          # Ubuntu

jameshammerton · May 31, 2018, 9:52am

I had already purged the cuda-8.0 (and botched cuda-9.1), before installing 9.2, and ‘sudo apt list --installed | grep nvidia’ shows the following:

nvidia-396/unknown,now 396.26-0ubuntu1 amd64 [installed,automatic]
nvidia-396-dev/unknown,now 396.26-0ubuntu1 amd64 [installed,automatic]
nvidia-modprobe/unknown,now 396.26-0ubuntu1 amd64 [installed,automatic]
nvidia-opencl-icd-396/unknown,now 396.26-0ubuntu1 amd64 [installed,automatic]
nvidia-prime/xenial,now 0.8.2 amd64 [installed,automatic]
nvidia-settings/unknown,now 396.26-0ubuntu1 amd64 [installed,automatic]

It looks to me like the old drivers aren’t there but modprobe doesn’t like the new drivers:

$ sudo /sbin/modprobe nvidia
modprobe: ERROR: could not insert 'nvidia_396': Exec format error

Any idea as to the most likely reason for this message?

Andrey1984 · May 31, 2018, 11:28am

may be you can approach a local [run file] installation with different outcomes?

jameshammerton · May 31, 2018, 12:57pm

I managed to fix my problem.

It turns out that although I’d ensured gcc 5.4.0 was installed, the system was still defaulting to the gcc 4.9.3.

I fixed that via:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 1

Then after purging and reinstalling cuda, everything worked until I realised I needed cuda 9.0 for tensorflow 1.5.0+ not 9.2. Cue another purge and install of 9.0 and I then got my tensorflow setup working with the latest tensorflow_gpu version.

Topic		Replies	Views
Problems loading nvidia drivers after cuda toolkit installation Linux cuda	14	9106	November 11, 2021
Cuda 10.0 install claims missing driver, but it is installed. CUDA Setup and Installation	6	2590	May 24, 2019
Nvidia Driver and CUDA Installation Sequence ! CUDA Setup and Installation	12	17841	July 23, 2021
Linux is unable to load driver kernel modules. Exec format error CUDA Setup and Installation	2	7174	March 23, 2018
Uninstall previous cuda version CUDA Setup and Installation	2	16592	November 19, 2018
Driver update fails on Ubuntu after Cuda-6.5 install from .deb CUDA Setup and Installation	6	6443	March 14, 2015
NVIDIA-SMI no longer works and fresh nvidia-driver installs fail CUDA Setup and Installation cuda , ubuntu	1	1870	January 16, 2024
Installing NVIDIA driver and CUDA to Kali Linux Linux cuda	14	24018	November 24, 2021
Install CUDA-9 on Ubuntu 16.04 with the runfile and pre-installed drivers CUDA Setup and Installation	15	58771	February 28, 2020
Delete all nvidia-driver but still show in nvidia-smi on ubuntu 18.04 CUDA Setup and Installation	3	3044	February 12, 2024

nvidia driver failing to load after upgrade from Cuda 8 to Cuda 9.2

Related topics