M60 vGPU with Xorg "(EE) No devices detected"

NathanKidd · August 30, 2016, 9:36pm

We have a new M60 in our Dell R720, VMware ESXi 6.0/vSphere 6. vGPU profiles
work fine with a Windows 10 VM, but not in CentOS (tried 6.8 and 7), where Xorg.0.log
always says (EE) No devices detected. and exits:

[ 13572.243] (II) LoadModule: "glx"
[ 13572.243] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 13572.248] (II) Module glx: vendor="NVIDIA Corporation"
[ 13572.248]    compiled for 4.0.2, module version = 1.0.0
[ 13572.248]    Module class: X.Org Server Extension
[ 13572.248] (II) NVIDIA GLX Module  361.45.09  Tue May 10 08:44:16 PDT 2016
[ 13572.248] (II) LoadModule: "nvidia"
[ 13572.249] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 13572.249] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 13572.249]    compiled for 4.0.2, module version = 1.0.0
[ 13572.249]    Module class: X.Org Video Driver
[ 13572.249] (II) NVIDIA dlloader X Driver  361.45.09  Tue May 10 08:22:21 PDT 2016
[ 13572.249] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 13572.249] (--) using VT number 7

[ 13572.252] (EE) No devices detected.
[ 13572.252] (EE) 
Fatal server error:
[ 13572.252] (EE) no screens found(EE) 
[ 13572.252] (EE)

This is running 361.45.09 drivers which appear okay on the hypervisor side as well as the guest VM.
The guest VM runs nvidia-smi and sees a GRID M60-4Q vGPU profile. xorg.conf was generated by nvidia-xconfig --enable-all-gpus --use-display-device=none. Licensing appears correctly set up. The GPU has been put into graphical mode. Kernel module is loaded, and dmesg shows nothing untoward. The R720 was previously running with a K1 and K2, which are now removed to keep things simple. And to reiterate, the Win10 VM works, with OpenGL renderer string showing the M60 vGPU profile. I’ve exhausted my forum / internet searching.

Anyone have ideas to try?

nvidia-bug-report.log.gz:
https://mft.opentext.com/MFT/Transfer?action=GetFile&name=37b52528-58cc-4060-9cfc-f8d2fb458dcf&TID=e2f8e371-e0a5-40e2-b833-68e5b2c77f14&nojava=true
vmware.log.gz:
https://mft.opentext.com/MFT/Transfer?action=GetFile&name=b9093aab-0050-4c95-a234-373f5874d76b&TID=e2f8e371-e0a5-40e2-b833-68e5b2c77f14&nojava=true
(URLs will expire on 2016-09-13)

Thanks

RachelBerry · August 31, 2016, 10:37am

I don’t think CentOS is an OS officially supported by VMware (and consequently by NVIDIA) so you might want consider that. CentOS is supported by Citrix Linux VDA at the moment and given it’s similarity to RHEL I would expect from our side for it to work. However you should look carefully at which OSs and even versions Vmware/Citrix support.

There are a few common setup issues with CentOS / RHEL detailed in our knowledge base, could you have a look at them: Find Answers | NVIDIA
And see if anything rings a bell?

Rachel

JasonSouthernNV · August 31, 2016, 10:56am

That configuration will run CentOS 7 perfectly well, whilst it may not be "supported" by all vendors in the stack it should run perfectly well.

However,

First observation - M60 are not certified in the Dell R720, you need the R730

Second - Double check that nouveau is completely disabled and not restarting after you installed the NVIDIA driver. You need to block it in several locations to ensure it’s not capturing the hardware and preventing it being detected properly.

Also , what remoting solution are you using?

You reference vSphere as the underlying hypervisor, but there’s no mention of the remoting solution, without which there are no display devices attached. Horizon should add this to xorg.conf when the agent is installed.

JasonSouthernNV · August 31, 2016, 11:50am

Also, just to confirm that you’re using the Linux driver from the bundle downloaded from

https://nvidia.flexnetoperations.com/control/nvda/login

These are the drivers required for M60.

You should also amend the license settings in gridd.conf (though that’s not directly relevant to this issue).

NathanKidd · August 31, 2016, 3:18pm

Thanks for the replies.

Ah, thanks. The GRID ReleaseNotes explicitly include CentOS, and that VMWare link does include both 6.x and 7 for ESXi 6 U2, but I guess you’re saying vGPU profiles for VMWare are a separate support issue. I’ve been unable to separate hypervisor support from Horizon support (which we’re not using, see below) in the VMWare links I’ve found. Do you have a more explicit link? Anyway, this is a bit tangential since Jason says it should work on CentOS 7.

I’ve reviewed that (and seen most of those posts already) but don’t see anything relevant.

Yes, this is where we got the drivers.

Also done, and the license server sees the VM has a license registered.

This was my mistake, it is the R730.

Yes, nouveau is blacklisted, nvidia module is loaded.

This perhaps is the issue. We (OpenText) are an ISV, with our own remoting solution (ETX). Perhaps this is my misunderstanding since previously we ran Bare Metal with K1/K2; Are you saying the Nvidia driver will not load X.org without a special xorg.conf? I.e. nvidia-xconfig --use-display-device=none doesn’t work with vGPU, like it does for bare metal?

I only want the headless X.org to run, and all remoting is my own.

Thanks.

NathanKidd · September 6, 2016, 4:00pm

After a bunch of testing, the solution is to not use nvidia-xonfig to generate xorg.conf. X.org won’t start with the generated ServerLayout, Monitor and Screen sections (even with UseDisplayDevice "None"). The device section also needs an explicit BusID device added.

A minimal working config is, e.g.:

Section "DRI"
	Mode 0666
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GRID M60-4Q"
    BusID          "PCI:2:0:0"
EndSection

JasonJuang · August 14, 2017, 5:47pm

NathanKidd:

After a bunch of testing, the solution is to not use nvidia-xonfig to generate xorg.conf. X.org won’t start with the generated ServerLayout, Monitor and Screen sections (even with UseDisplayDevice "None"). The device section also needs an explicit BusID device added.

A minimal working config is, e.g.:
Section "DRI"
	Mode 0666
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GRID M60-4Q"
    BusID          "PCI:2:0:0"
EndSection

I ran into the exact same problem when installing headless xorg for the Amazon g3 instance and this works like magic. Could you share the full working config? Thanks!

NathanKidd · September 10, 2018, 4:09pm

@JasJuang, I didn’t see this post for a year, sorry.

The final solution I used was:

nvidia-xconfig --enable-all-gpus --use-display-device=none --busid=<BUSID>

Value(s) for <BUSID> can be found by running

nvidia-xconfig --query-gpu-info | grep BusID

If you have more than one GPU enabled (as is likely with --enable-all-gups) you’ll need to manually edit your xorg.conf to give unique BusID values in each screen section; the above --busid argument will specify the same busid for every screen.

dictum · October 6, 2023, 1:11am

I have exact same problem except I use Oracle Linux. The above solution did not work for me.

Topic		Replies	Views
nvidia-xconfig output doesn't work for vGPU NVIDIA Virtual GPU Technology	5	11489	September 15, 2016
Centos 7, GeForce GT740, unable to start X server Linux	7	4826	October 14, 2021
NVIDIA GRID VGPU support does not match desktop setting + Esxi console blank General Discussion	20	24442	June 15, 2017
Dell R730 with Tesla M60 on XenServer 7.0 unexpectedly reboot when a few VMs with vGPU are started NVIDIA Virtual GPU Technology	31	39557	February 24, 2017
M5000 GPU Pass-through to VM General Discussion	6	8227	July 10, 2018
GRID 3.0 Successfully installs on ESXI 6.0.2 with M60 GPU but fails to verify via nvidia-smi NVIDIA Virtual GPU Drivers	16	46038	May 11, 2016
NVIDIA-Linux-x86_64-418.113 wouldn't build Linux	36	3172	October 12, 2021
vDGA Grid with VMWare ESX (no Horizon View) General Discussion	19	51832	September 9, 2016
Nvidia driver/CUDA installation causes centos 7 to hang on boot. unable to access user interface. CUDA Setup and Installation	29	28919	February 10, 2018
Can I do remote direct rendering with Tesla P4 on CentOS 7? Linux	11	3508	March 14, 2018

M60 vGPU with Xorg "(EE) No devices detected"

Related topics