Dear all,
I am having trouble installing CUDA 10 drivers on a Ubuntu 18.04 VM guest in VMware ESXi. Here are the steps followed and the corresponding output:
Hardware: VMware ESXi 5.5.0 on HP Z800 2*12c Xeon X5650 74 GB Ram
2xAsus 1080/8GB Passthrough to the VM
GuestVM:1 VM with 4 cores and 24GB Ram (VM Version 8)
2.1 Pre-installation Actions
$ lspci | grep -i nvidia
0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
1b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
2.2 Verify You Have a Supported Version of Linux
$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
2.3. Verify the System Has gcc Installed
$ gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ uname -r
4.15.0-45-generic
$ sudo apt-get install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-4.15.0-45-generic is already the newest version (4.15.0-45.48).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Check for nvidia packages
$ sudo apt list --installed|grep -i nvidia
(no packages)
Disable nouveau
$ cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
$sudo update-initramfs -u
$sudo reboot
Install Runtime
$ sudo ./cuda_10.0.130_410.48_linux.run
install in /usr/local/cuda-10.0
Set PATH at .bashrc
(add export PATH=/usr/local/cuda-10.0/bin:$PATH)
Add /usr/local/cuda-10.0/lib64 to .bashrc
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
reboot
Show nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:0B:00.0 Off | N/A |
| 26% 45C P0 42W / 180W | 0MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:1B:00.0 Off | N/A |
| 22% 39C P0 35W / 180W | 0MiB / 8119MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Show loaded modules
$ lsmod|grep nvidia
nvidia_uvm 782336 0
nvidia_drm 45056 0
nvidia_modeset 1044480 1 nvidia_drm
nvidia 16797696 2 nvidia_uvm,nvidia_modeset
ipmi_msghandler 53248 2 ipmi_devintf,nvidia
drm_kms_helper 172032 2 vmwgfx,nvidia_drm
drm 401408 5 vmwgfx,drm_kms_helper,nvidia_drm,ttm
Execute device Mode Verification script
$ verification.sh
mknod: /dev/nvidia0: File exists
mknod: /dev/nvidia1: File exists
mknod: /dev/nvidiactl: File exists
mknod: /dev/nvidia-uvm: File exists
Check for 666 permissions
$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Feb 6 13:14 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Feb 6 13:14 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Feb 6 13:14 /dev/nvidiactl
crw-rw-rw- 1 root root 241, 0 Feb 6 13:14 /dev/nvidia-uvm
crw-rw-rw- 1 root root 241, 1 Feb 6 13:14 /dev/nvidia-uvm-tools
However samples fail with:
/usr/local/cuda-10.0/samples/0_Simple/clock$ ./clock
CUDA Clock sample
GPU Device 0: “GeForce GTX 1080” with compute capability 6.1
CUDA error at clock.cu:112 code=46(cudaErrorDevicesUnavailable) “cudaMalloc((void **)&dinput,
sizeof(float) * NUM_THREADS * 2)”
I have tried several things but nothing worked so far. Any tips? Am I missing something?
Thank you,
Costas Voglis
Nodalpoint Systems, Athens
PS. 2 weeks ago the system above was operational for a couple of years using CUDA 9 and 384 family of drivers.