CUDA 10 and Ubuntu18.04

Dear all,

I am having trouble installing CUDA 10 drivers on a Ubuntu 18.04 VM guest in VMware ESXi. Here are the steps followed and the corresponding output:


Hardware: VMware ESXi 5.5.0 on HP Z800 2*12c Xeon X5650 74 GB Ram
2xAsus 1080/8GB Passthrough to the VM
GuestVM:1 VM with 4 cores and 24GB Ram (VM Version 8)

2.1 Pre-installation Actions

$ lspci | grep -i nvidia
0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
1b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)

2.2 Verify You Have a Supported Version of Linux

$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

2.3. Verify the System Has gcc Installed

$ gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ uname -r
4.15.0-45-generic
$ sudo apt-get install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-4.15.0-45-generic is already the newest version (4.15.0-45.48).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Check for nvidia packages

$ sudo apt list --installed|grep -i nvidia
(no packages)

Disable nouveau

$ cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
$sudo update-initramfs -u
$sudo reboot

Install Runtime

$ sudo ./cuda_10.0.130_410.48_linux.run
install in /usr/local/cuda-10.0

Set PATH at .bashrc
(add export PATH=/usr/local/cuda-10.0/bin:$PATH)

Add /usr/local/cuda-10.0/lib64 to .bashrc

export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

reboot

Show nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:0B:00.0 Off |                  N/A |
| 26%   45C    P0    42W / 180W |      0MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:1B:00.0 Off |                  N/A |
| 22%   39C    P0    35W / 180W |      0MiB /  8119MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Show loaded modules

$ lsmod|grep nvidia
nvidia_uvm            782336  0
nvidia_drm             45056  0
nvidia_modeset       1044480  1 nvidia_drm
nvidia              16797696  2 nvidia_uvm,nvidia_modeset
ipmi_msghandler        53248  2 ipmi_devintf,nvidia
drm_kms_helper        172032  2 vmwgfx,nvidia_drm
drm                   401408  5 vmwgfx,drm_kms_helper,nvidia_drm,ttm

Execute device Mode Verification script

$ verification.sh
mknod: /dev/nvidia0: File exists
mknod: /dev/nvidia1: File exists
mknod: /dev/nvidiactl: File exists
mknod: /dev/nvidia-uvm: File exists

Check for 666 permissions

$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Feb  6 13:14 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Feb  6 13:14 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Feb  6 13:14 /dev/nvidiactl
crw-rw-rw- 1 root root 241,   0 Feb  6 13:14 /dev/nvidia-uvm
crw-rw-rw- 1 root root 241,   1 Feb  6 13:14 /dev/nvidia-uvm-tools

However samples fail with:
/usr/local/cuda-10.0/samples/0_Simple/clock$ ./clock
CUDA Clock sample
GPU Device 0: “GeForce GTX 1080” with compute capability 6.1

CUDA error at clock.cu:112 code=46(cudaErrorDevicesUnavailable) “cudaMalloc((void **)&dinput,
sizeof(float) * NUM_THREADS * 2)”

I have tried several things but nothing worked so far. Any tips? Am I missing something?
Thank you,

Costas Voglis
Nodalpoint Systems, Athens

PS. 2 weeks ago the system above was operational for a couple of years using CUDA 9 and 384 family of drivers.

Needless to say that the old configuration suddently stopped working with the same error code=46. Then we decided to update to newest versions.

The same error message was issued after installing CUDA 10 drivers on a Centos 7.6 guest guest in VMware ESXi.

I believe your problem is that you are using a virtual machine!

Please check out my post on not using a virtual machine here:

https://cudaeducation.com/jetsonxavier/

Also, you will get some more info on the challenges with running Ubuntu on a host machine that has an NVIDIA graphics card attached to it. I believe I talk about the VM experience also:

https://cudaeducation.com/ubuntujetsonxavier/

I hope this helps!

-Cuda Education
cudaeducation.com