vGPU boot error

Having an issue getting vGPU working. If I try to assign any machine a shared GPU - I get the following error when I attempt to boot the machine:

internal error: xenopsd internal error: Unix.Unix_error (20, "open", "/ sys / bus / pci / drivers / nvidia / bind")

there are no other machines using the GPU. passthrough works fine.

XenServer 6.2 SP1, Cisco UCS C240M3.

any suggestions?

Hi roodabigman,

Could you please try to update this latest patch Hotfix XS62ESP1004 - For XenServer 6.2.0 Service Pack 1 and check

Hi Raja - thanks for the suggestion.

unfortunately installing the patch did not have an effect, still have the same error when booting.

Thanks roodabigman for the quick check.

Could you please help to confirm the below queries?

  1. May I know the VM OS version?
  2. Are you trying to assign the vGPU to VM via XenServer [using command] or XenCenter [using GUI]?
  3. Please provide nvidia bug report by running nvidia-bug-report.sh script

Hi Raja,

  1. Windows 7 SP1 64-bit
  2. assigned via XenCenter GUI
  3. found here that the Nvidia module is not loading correctly - returns fatal error - module not loaded

lsmod | grep nvidia - returns nothing
Memory Mapped I/O above 4 GB already disabled
dmesg | grep NVIDIA - returns nothing

I will have to troubleshoot why the xenserver is not loading the module.

Try uninstalling and reinstalling the vGPU driver (be sure that is the driver you downloaded from our site), rebooting after each step.

Hi roodabigman,

Any update after re-installing vGPU driver [on HOST]?

Issue fixed with How to Resolve GPU Memory Mapping Issues in XenServer

Change Memory Mapped I/O above 4GB to Disabled. It works.

How to Resolve GPU Memory Mapping Issues in XenServer
CTX139834 Created onMar 26, 2014 Updated onApr 02, 2014
Article Topic : Storage, Other
See Applicable Products
Objective
This article is for customers running XenServer 6.2.0 who are using the 3D Graphics Pack (3DGP) with NVIDIA GRID GPUs, and have problems starting Virtual Machines (VMs) with a virtual GPU (vGPU) created. Customers may find that virtual machines fail to start with a message similar to the following:
Unix.Unix_error(20, "open", "/sys/bus/pci/drivers/nvidia/bind")
This can be caused by the NVIDIA driver not loading in the host’s control domain. To check this, run the following command on the host console:
lsmod | grep ^NVidia
This will return no results if the driver is not loaded.
To find out whether this is caused by the memory mapping issue, run the following command on the host console:
dmesg | grep NVIDIA
Check for messages containing:
"This PCI I/O region assigned to your NVIDIA device is invalid"
If you see this message, it confirms that the GPU has been mapped into memory inaccessible to the host’s control domain. This can be resolved with a change to the BIOS settings.
Instructions
The following sample procedure is for a Dell R720 server. For other server types, refer to the vendor documentation.
Reboot the server and enter System Setup (press F2).
Navigate to System BIOS, and then Integrated Devices.
Change Memory Mapped I/O above 4GB to Disabled.
Save the settings and reboot the host. It should now be possible to start VMs with vGPUs.

Thanks All, I hope roodabigman already mentioned he already disable the "Memory Mapped I/O above 4GB" option in SBIOS. Refer comment # 5.

Hi roodabigman,

Could you please double confirm whether "Memory Mapped I/O above 4GB" option is disabled or not?

Hi Raja,

sorry for the extended delay - environment was migrated to a different datacenter and we have had other projects that took priority.

Memory Mapped I/O is and has always been disabled.

A bit of success - we updated the firmware on the hardware (C240M3) to the latest version from Cisco. Now 1 of the 2 server will boot vGPU :), the other still gives the same errors for some reason.

We’re going to re-flash the firmware and open them to be sure the hardware config is identical between the two of them, will let you know if we get it working.