problem: GPGPU on Xen kernels nvidia drivers do not seem to work on xen kernels

thunderbird · May 6, 2011, 2:28pm

Hi guys,

I spent the whole week trying to setup a Ubuntu 10.04 64-bit machine for Xen virtualization (Xen 4.1) for GPGPU tests on several virtual machines (using Xen’s GPU passthrough capability). My problem is that I just can’t get the nvidia development driver to work on the xen kernel (works perfectly fine with the standard kernel).

For compiling the kernel, I followed the instructions from http://www.zeroaccess.org/2011/04/xen-4-1-on-ubuntu-10-04-64bit/ (but built the kernel the debian way (make-kpkg) to get a nice .deb package). It boots fine, I modified grub to set the kernel options, etc. All the Xen setup seems to be working as it should.

I can’t get the NVIDIA driver working on the host (Dom0) though. I used the driver with CUDA 3.2, and also the 4.0 RC2 driver (for a GTX 590 card). When the system tries to start X11, the screen turns blank and the system gets very slow. Booting the system into text mode works fine. I can load the nvidia driver manually (modprobe nvidia), and create the device nodes in /dev using mknode (so I have the /dev/nvidia0, /dev/nvidia1, /dev/nvidiactl - with major number 195, and minors 0, 1, 255, respectively). When I try to build anything using OpenCL, it just reports that no platforms have been found. With CUDA, I get the error: “cudaSafeCall() Runtime API error : invalid device ordinal.” Both work completely fine when I boot the system into a standard kernel (linux-image-generic, default with ubuntu 10.04).

The X11 log just says that it failed to load the NVIDIA module. Syslog gives messages like “NVRM: RmInitAdapter failed!” .

I tried various suggestions for installing the the driver found on the web (e.g., http://wiki.xensource.com/xenwiki/NvidiaGPU?highlight=(nvidia) ) but with no success.

Did any of you get NVIDIA and Xen work together?? How? Any help is appreciated!

mikejc · May 8, 2011, 2:08am

the nvidia drivers do not work with ubuntu dom0. at least not with any of the kernels and xorg that comes with ubuntu.
nvidia drivers do work with dom0 based on opensuse 11.3 and 11.4 that i am currently using. you could also use the master kernel archive from opensuse, build that on ubuntu and possibly get it to work on your system. but if you want it to be the easiest, i’d go with opensuse. xen just installs. to install nvidia driver, all you have to do while under xen is “export IGNORE_XEN_PRESENCE=1” before you run the .sh file.

thunderbird · May 10, 2011, 10:22am

Thanks for the quick reply. I did as you said and switched to OpenSuSE 11.4 (64bit). I got the nvidia drivers working on the native (non-xen) kernel and all SDK examples work with no problem. I did the ignore xen trick and installed them on the xen kernel as well. The X server works fine with this driver on the XEN Dom0. But when I run any of the CUDA SDK examples I get an error telling me that all CUDA devices are busy or unavailable. I also did this in text mode, to be sure it is nothing in the X server which causes this - but with the same results. The OpenCL examples just return an out_of_resources error. They all detect the GTX 590 graphics card fine however. Do you have any ideas what might be wrong?

I am using the 270.41.6 driver now, and installed it as described here in XEN: http://old-en.opensuse.org/Talk:Use_Nvidia_driver_with_Xen (last post on the bottom). Also installed the NVIDIA driver for the native kernel using opensuse’s community repository, so that all the libraries and header files are where they should be. So I just placed a manually-built nvidia.ko into the /lib/modules/xxx-xen, the user-space part is already there (versions match).

laika · May 17, 2011, 4:41am

currently, i’m also suffering similar problem. My enviroment is centos 5.5.

I also got the kernel warning messages below.

May 17 00:16:21 localhost kernel: NVRM: bad caching on address 0xffff8805b8aa5000: actual 0x77 != expected 0x73

May 17 00:16:21 localhost kernel: NVRM: please see the README section on Cache Aliasing for more information

May 17 00:16:21 localhost kernel: NVRM: bad caching on address 0xffff8805b8aa6000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805ba686000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805bb754000: actual 0x67 != expected 0x63

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805bbad3000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805bada7000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805ba806000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805ba7e0000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805bb757000: actual 0x77 != expected 0x73

May 17 00:16:22 localhost kernel: NVRM: bad caching on address 0xffff8805ba649000: actual 0x77 != expected 0x73

Any suggestion?

Topic		Replies	Views
GTX 750 Ti Blank Screen On Xen 4 CUDA Setup and Installation	4	1744	April 10, 2015
CUDA in Xen dom0 CUDA Programming and Performance	1	3073	June 7, 2010
Difference of CUDA in Xen and non-Xen Kernel what makes xen kernel fail to use nvidia device CUDA Programming and Performance	2	2014	August 5, 2013
CUDA on Ubuntu 8.10 CUDA Programming and Performance	8	8070	January 30, 2009
Xen and Nvidia driver (Debain 10 Buster) Linux	0	540	June 2, 2021
GeForce driver problem on Centos 6.4 with XEN installed Linux	25	12358	August 4, 2017
Cuda Issues on Xen dom0 CUDA Programming and Performance	0	8295	December 21, 2009
Missing lcuda + no device Latest driver + SDK 1.1 creating trouble CUDA Programming and Performance	7	7492	June 23, 2008
CUDA Virtualization on XEN CUDA Programming and Performance	1	5154	December 1, 2010
Tesla card on Lucid Lynx - no CUDA-capable device is detected CUDA Programming and Performance	18	20070	February 2, 2011

problem: GPGPU on Xen kernels nvidia drivers do not seem to work on xen kernels

Related topics