CUDA 9.1 on Ubuntu 16.04 installed, but deviceQuery fails

cb_cross · November 2, 2018, 3:47pm

nvidia-bug-report.sh output: nvidia-bug-report.log.gz - Google Drive

$ uname -a
Linux roswell 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I am following the CUDA installation guide. I need to use CUDA 9.1 because it’s the version used by the tools I ultimately need to work with.

I installed the latest NVIDIA driver:

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.73  Sat Oct 20 22:12:33 CDT 2018
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)

I installed CUDA 9.1 from the net installer.

I built the samples.

deviceQuery failed:

$ bin/x86_64/linux/release/deviceQuery
bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Any troubleshooting help would be most appreciated.

Robert_Crovella · November 2, 2018, 4:36pm

Here’s some output from your bug report log (specifically from dmesg |grep NVRM):

Nov 02 07:58:59 roswell kernel: NVRM: API mismatch: the client has the version 387.26, but
                                NVRM: this kernel module has the version 410.73.  Please
                                NVRM: make sure that this kernel module and all NVIDIA driver
                                NVRM: components have the same version.
Nov 02 07:58:59 roswell kernel: NVRM: API mismatch: the client has the version 387.26, but
                                NVRM: this kernel module has the version 410.73.  Please
                                NVRM: make sure that this kernel module and all NVIDIA driver
                                NVRM: components have the same version.
Nov 02 08:09:08 roswell systemd[1]: Configuration file /lib/systemd/system/nvidia-persistenced.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Nov 02 08:09:09 roswell userdel[29548]: delete user 'nvidia-persistenced'
Nov 02 08:09:09 roswell userdel[29548]: removed group 'nvidia-persistenced' owned by 'nvidia-persistenced'
Nov 02 08:09:09 roswell userdel[29548]: removed shadow group 'nvidia-persistenced' owned by 'nvidia-persistenced'
Nov 02 08:32:24 roswell gnome-session[1637]: (gnome-software:1755): As-WARNING **: failed to rescan: Failed to parse /usr/share/applications/nvidia-settings.desktop.dpkg-new file: cannot process file of type text/plain
Nov 02 08:32:24 roswell gnome-session[1637]: (gnome-software:1755): As-WARNING **: failed to rescan: Failed to parse /usr/share/applications/nvidia-settings.desktop.dpkg-tmp file: cannot process file of type text/plain
Nov 02 08:32:24 roswell gnome-session[1637]: (gnome-software:1755): As-WARNING **: failed to rescan: Failed to parse /usr/share/applications/nvidia-settings.desktop file: cannot process file of type application/x-desktop
Nov 02 08:32:29 roswell groupadd[8035]: group added to /etc/group: name=nvidia-persistenced, GID=131
Nov 02 08:32:29 roswell groupadd[8035]: group added to /etc/gshadow: name=nvidia-persistenced
Nov 02 08:32:29 roswell groupadd[8035]: new group: name=nvidia-persistenced, GID=131
Nov 02 08:32:29 roswell useradd[8039]: new user: name=nvidia-persistenced, UID=124, GID=131, home=/, shell=/sbin/nologin
Nov 02 08:32:29 roswell usermod[8044]: change user 'nvidia-persistenced' password
Nov 02 08:32:29 roswell chage[8049]: changed password expiry for nvidia-persistenced
Nov 02 08:32:29 roswell chfn[8052]: changed user 'nvidia-persistenced' information
Nov 02 08:55:23 roswell kernel: NVRM: API mismatch: the client has the version 410.72, but
                                NVRM: this kernel module has the version 410.73.  Please
                                NVRM: make sure that this kernel module and all NVIDIA driver
                                NVRM: components have the same version.
Nov 02 10:27:44 roswell kernel: NVRM: API mismatch: the client has the version 410.72, but
                                NVRM: this kernel module has the version 410.73.  Please
                                NVRM: make sure that this kernel module and all NVIDIA driver
                                NVRM: components have the same version.

So it looks like you’ve been installing multiple driver versions.

If you want to use CUDA, I recommend installing drivers only from an NVIDIA source, not from ppa archives or any other source. Furthermore, depending on how you installed each of these several drivers, things may be very messed up. If at any point you mixed runfile install after a previous package manager install, that is a recipe to break things.

I would recommend following the instructions in the linux install guide regarding “handling conflicting installations” to completely clean out all old installs of GPU drivers. Then pick a driver to install, and follow the linux install guide carefully.

I also note that your GPU is driving a display. if this is on a laptop, be advised that laptop linux installs may require extra effort, such as careful use of nvidia-prime

cb_cross · November 2, 2018, 5:25pm

Thanks for the quick reply.

I thought I did uninstall the first attempt, which was using the local installer. Apparently it left some junk around.

The install guide says to get the latest driver from NVIDIA. Check, got it - it only comes in a .run version, at least through the public website.

The pinned post here in the forum says to use the net-installer for CUDA. It is only available in a .deb

How do I reconcile the 2 best-practices which are at odds with each other?

Since the net-installer for CUDA seemed to have the 410.72 driver, do I really need the 410.73 driver that the .run installs?

Robert_Crovella · November 2, 2018, 5:37pm

One pinned post refers to CUDA 8. Are you installing CUDA 8?
The pinned post referencing CUDA 9.1 doesn’t say anywhere in it “you must use a network installer”. It says:

Before installing CUDA 9.1, ensure that you have the latest NVIDIA driver R390 installed. The latest NVIDIA R390 driver is available at: www.nvidia.com/drivers

The CUDA network repositories have also been updated with the latest R390 driver packages. For more information about installing driver and CUDA from the network repository, see the Linux Installation Guide at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

You can use a runfile installation of a driver with a deb-installed cuda toolkit.

Read the linux install guide. It indicates how to do a toolkit-only install using a package manager method.

The problems arise when you attempt to install a driver via package manager, and then a driver via runfile installer, without doing a full cleanup in-between.

There is no conflict in using a driver installed via runfile installer, with a cuda toolkit install (no driver) via package manager.

Read the linux install guide. In its entirety.

The 410.72 driver should be fine. The 410.73 should also be fine.

Just don’t mix a driver runfile install with a driver package manager install (unless you do a full cleanup in-between).

cb_cross · November 2, 2018, 5:56pm

FYI to wrap this up.

I ran apt remove on all cuda and nvidia packages except for the repo definition.
I ran apt list -i | grep -i cuda and apt list -i | grep -i nvidia to make sure only the repo definition remained.

I then re-ran the net install for the “cuda-9-1” target and rebooted.

The deviceQuery sample program now works.

Topic		Replies	Views
CUDA ToolKit 9.1: CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	0	1775	May 5, 2018
There is No device supporting CUDA CUDA Programming and Performance	8	15459	August 26, 2009
Problem installing cuda 9.1 on Ubuntu 16.04 CUDA Setup and Installation	3	16748	December 23, 2017
Failed Cuda Driver and Runtime version may be mismatched Cuda installation fails on Ubuntu 10.4 x86_ CUDA Programming and Performance	13	5190	November 17, 2010
NVRM: API mismatch CUDA Setup and Installation	2	31985	September 14, 2015
deviceQuery fails CUDA Driver and Runtime version may be mismatched. CUDA Programming and Performance	4	3726	February 23, 2011
problem with sdk 1.1 in opensuse 10.2 CUDA Programming and Performance	2	8375	March 20, 2008
Still can't install cuda correctly CUDA Setup and Installation	3	1633	February 17, 2018
deviceQuery CUDA Programming and Performance	4	10247	December 15, 2009
NVRM: API mismatch CUDA Setup and Installation	0	2178	May 8, 2020

CUDA 9.1 on Ubuntu 16.04 installed, but deviceQuery fails

Related topics