Problem with Quadro K3100M KVM Passthrough

Hi,

We have an HPE blade system workstation, WS480 Gen9 with the Nvidia 6x GPU side car that was working with XEN Server sharing out 6 virtual systems.

In trying to save some money, we decided to setup RHEL 7 Server with KVM and passthough the graphics cards to the systems and install RHEL 7 on the guests.

We have a proof of concept system up and running and lspci shows the graphics card and we can install the driver, but when we try to start X using RGS, we get the following error in /var/log/Xorg.0.log:

485.933] (EE) NVIDIA(0): Failed to allocate software rendering cache surface: out of
[   485.934] (EE) NVIDIA(0):     memory.
[   485.934] (EE) NVIDIA(0):  *** Aborting ***
[   485.940] (EE) 
[   485.940] (EE) NVIDIA: A GPU exception occurred during X server initialization(EE) 
[   485.940] (EE) 
[   485.940] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[   485.940] (EE) 
[   485.940] (EE) Server terminated with error (1). Closing log file.

We are also getting the following in dmesg:

8.632955] NVRM: Xid (PCI:0000:00:09): 69, Illegal Class Error: ChID 0008, Class 00000000, Offset 0000010c, Data 00000000
[    8.970383] NVRM: GPU at PCI:0000:00:09: GPU-3f856369-9e87-188e-4e26-f6f5c2997711
[    8.971100] NVRM: Xid (PCI:0000:00:09): 69, Illegal Class Error: ChID 0008, Class 00000000, Offset 0000010c, Data 00000000
[   12.872943] random: crng init done

we are installing the 390.87 driver and here is the SMI output from the system:

# nvidia-smi 
Wed Sep 26 12:04:07 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K3100M       Off  | 00000000:00:09.0 Off |                  N/A |
| N/A   37C    P0    13W /  N/A |      0MiB /  4037MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Any help would be great!

Let me know if you need any more info!

Thanks!
Joe Giles
nvidia-bug-report.log-ceerws0305a.gz (75 KB)

Did you check with another kernel (4.14+) on the guest?
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post will reveal a paperclip icon.

Hi Generix,

I will attach to the original message. I did try another kernel on both the host and the guest.

Here is some additional information.

I get this on the terminal of the guest:

[    8.734794] NVRM: GPU at PCI:0000:00:09: GPU-3f856369-9e87-188e-4e26-f6f5c2997711
[    8.735463] NVRM: Xid (PCI:0000:00:09): 69, Illegal Class Error: ChID 0008, Class 00000000, Offset 0000010c, Data 00000000
[    9.099207] NVRM: GPU at PCI:0000:00:09: GPU-3f856369-9e87-188e-4e26-f6f5c2997711