deviceQuery - invalid device ordinal - Ubuntu 14.04 Server

Hi guys, I’m stuck trying to figure this one out… This is with a 750Ti VM passthrough to Linux guest.

I followed this guide. - http://www.r-tutor.com/gpu-computing/cuda-installation/cuda6.5-ubuntu

Same issue as in the title, ‘invalid device ordinal’.

I removed everything, attempted to install the driver manually, and then repeated the process with no luck.

I did run ‘make’ on all the samples and they compiled fine.

Here is the output of the commonly asked questions regarding support issues… I’d really appreciate any assistance!

# uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"
NAME="Ubuntu"
VERSION="14.04.2 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.2 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

#  nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

# gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

ls -la /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Mar  1 11:25 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Mar  1 11:24 /dev/nvidiactl

# cat /etc/modprobe.d/disable-nouveau.conf
blacklist nouveau
options nouveau modeset=0

# cat /etc/modprobe.d/blacklist
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

# lspci |grep -i nvidia
0b:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)

# lspci | grep -i NVIDIA | grep "VGA compatible controller"
0b:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  340.29  Thu Jul 31 20:23:19 PDT 2014
GCC version:  gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

# echo $LD_LIBRARY_PATH
/usr/local/cuda-6.5/lib64

# echo $PATH
/usr/local/cuda-6.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

# nvidia-settings

ERROR: The control display is undefined; please run `nvidia-settings --help` for
       usage information.

# nvidia-smi
Unable to determine the device handle for GPU 0000:0B:00.0: Unknown Error

# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 10
-> invalid device ordinal
Result = FAIL

# ./deviceQueryDrv
./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
cuInit(0) returned 101
-> CUDA_ERROR_INVALID_DEVICE (device specified is not a valid CUDA device)
Result = FAIL

And and interesting turn of events from dmesg… That doesn’t look good… Should I try moving the card to another PCI-e slot?

dmesg |grep NVRM
[    4.955879] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  340.29  Thu Jul 31 20:23:19 PDT 2014
[   71.509297] NVRM: RmInitAdapter failed! (0x23:0x2f:566)
[   71.509301] NVRM: rm_init_adapter failed for device bearing minor number 0
[   71.509383] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
[  194.759023] NVRM: RmInitAdapter failed! (0x23:0x2f:566)
[  194.759035] NVRM: rm_init_adapter failed for device bearing minor number 0
[  194.759117] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
[  438.095030] NVRM: RmInitAdapter failed! (0x23:0x2f:566)
[  438.095034] NVRM: rm_init_adapter failed for device bearing minor number 0
[  438.095115] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
[  478.065527] NVRM: RmInitAdapter failed! (0x23:0x2f:566)
[  478.065531] NVRM: rm_init_adapter failed for device bearing minor number 0
[  478.065613] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
[  569.087090] NVRM: RmInitAdapter failed! (0x23:0x2f:566)
[  569.087094] NVRM: rm_init_adapter failed for device bearing minor number 0
[  569.087176] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
[  656.594251] NVRM: RmInitAdapter failed! (0x23:0x2f:566)
[  656.594255] NVRM: rm_init_adapter failed for device bearing minor number 0
[  656.594337] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5

Not all NVIDIA GPUs are enabled for use in a VM passthrough setting. I suspect GTX 750Ti is not.

Thanks… it should be, I’m not seeing any complaints from VMware, and I have to explicitly add the card. It’s fully identified (along with the sound portion of the device), and passed through without any problems/warnings/etc, and obviously it’s showing up to the OS.

I did move PCIe slots and did have to re-add the card to the VM, and I’m still getting the same error.

Perhaps I’ll try something other than Ubuntu (or a later version).

Feel free to experiment.

http://www.pugetsystems.com/labs/articles/Multi-headed-VMWare-Gaming-Setup-564/

“After a lot of effort, NVIDIA eventually told us that PCI passthrough is simply not supported on GeForce cards and that they have no plans to add it in the immediate future.”

The same site does link to a proposed method for enabling pass-through with GeForce on KVM. YMMV

Thanks txbob!

I gave up on ESXi, it was definitely a VMWare issue, not Nvidia.

I went with ProxMox VE (KVM) and succesffully have it passing through to an Ubuntu 14.04 VM now. Took a few hours but deviceQuery and bandwidthTest are working just fine.