GeForce driver problem on Centos 6.4 with XEN installed

ftarz, how much memory is available in your Xen dom0 environment? The allocation that’s failing defaults to 8 MB, which is awfully small to be exhausting memory. You can set the SoftwareRenderCacheSize option to 1 to get the minimum size for the cache, if that helps.

you_know_who, your second problem is caused by something in your system cutting power to the second GPU.

Hi aplattner,

Thank you for the reply. So do we suppose setting the “SoftwareRenderCacheSize” option to 1 will resolve the issue? This DOM0 has 16GB of memory available.

Also, why do we think the power is being cut? This never happens using the exact same system and using the older driver. Are we suggesting that the PSU is just arbitrarily killing the power on the 2nd GPU? That seems odd that this would happen only when using the new driver. This has never happened using 319.49.

Thank you again for your thoughts and input.

Hi again, yesterday and today i again try XEN and nvidia. Now i take centOS 6.5(64bi) then i install nvidia 331.67 work but after i install XEN(HowTos/Xen/Xen4QuickStart - CentOS Wiki) then reboot start XEN kernel and again install nvidia driver for XEN kernel. But not work. There is syslog msg:

NVRM: RmInitAdapter failed! (0x26:0x38:1191)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5

full dmesg is here http://pastebin.com/tcNKDbfg

And there is Xorg.O.log msg:

[    36.547] 
X.Org X Server 1.13.0
Release Date: 2012-09-05
[    36.548] X Protocol Version 11, Revision 0
[    36.548] Build Operating System: c6b9 2.6.32-220.el6.x86_64 
[    36.549] Current Operating System: Linux XenServer-centos 3.10.34-11.el6.centos.alt.x86_64 #1 SMP Fri Mar 28 00:57:43 UTC 2014 x86_64
[    36.549] Kernel command line: ro root=UUID=ccf34c5f-68b6-4f47-ae84-537825127456 nomodeset rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=sk-qwerty rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto LANG=sk_SK.UTF-8 rd_NO_LVM rd_NO_DM rhgb quiet
[    36.551] Build Date: 20 December 2013  12:09:45PM
[    36.551] Build ID: xorg-x11-server 1.13.0-23.1.el6.centos 
[    36.551] Current version of pixman: 0.26.2
[    36.552] 	Before reporting problems, check http://wiki.centos.org/Documentation
	to make sure that you have the latest version.
[    36.552] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[    36.554] (==) Log file: "/var/log/Xorg.0.log", Time: Mon May 19 11:58:56 2014
[    36.556] (==) Using config file: "/etc/X11/xorg.conf"
[    36.556] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[    36.557] (==) ServerLayout "Layout0"
[    36.557] (**) |-->Screen "Screen0" (0)
[    36.557] (**) |   |-->Monitor "Monitor0"
[    36.557] (**) |   |-->Device "Device0"
[    36.557] (**) |-->Input Device "Keyboard0"
[    36.557] (**) |-->Input Device "Mouse0"
[    36.557] (==) Automatically adding devices
[    36.557] (==) Automatically enabling devices
[    36.557] (==) Not automatically adding GPU devices
[    36.558] (**) FontPath set to:
	/usr/share/fonts/default/Type1,
	catalogue:/etc/X11/fontpath.d,
	built-ins
[    36.558] (==) ModulePath set to "/usr/lib64/xorg/modules"
[    36.558] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[    36.558] (WW) Disabling Keyboard0
[    36.558] (WW) Disabling Mouse0
[    36.558] (II) Loader magic: 0x813020
[    36.558] (II) Module ABI versions:
[    36.558] 	X.Org ANSI C Emulation: 0.4
[    36.558] 	X.Org Video Driver: 13.1
[    36.558] 	X.Org XInput driver : 18.1
[    36.558] 	X.Org Server Extension : 7.0
[    36.559] (--) PCI:*(0:1:0:0) 10de:1201:1043:83ae rev 161, Mem @ 0xf4000000/33554432, 0xe8000000/134217728, 0xf0000000/67108864, I/O @ 0x0000e000/128, BIOS @ 0x????????/524288
[    36.559] Initializing built-in extension Generic Event Extension
[    36.560] Initializing built-in extension SHAPE
[    36.560] Initializing built-in extension MIT-SHM
[    36.560] Initializing built-in extension XInputExtension
[    36.561] Initializing built-in extension XTEST
[    36.561] Initializing built-in extension BIG-REQUESTS
[    36.561] Initializing built-in extension SYNC
[    36.562] Initializing built-in extension XKEYBOARD
[    36.562] Initializing built-in extension XC-MISC
[    36.562] Initializing built-in extension SECURITY
[    36.563] Initializing built-in extension XINERAMA
[    36.563] Initializing built-in extension XFIXES
[    36.563] Initializing built-in extension RENDER
[    36.564] Initializing built-in extension RANDR
[    36.564] Initializing built-in extension COMPOSITE
[    36.564] Initializing built-in extension DAMAGE
[    36.565] Initializing built-in extension MIT-SCREEN-SAVER
[    36.565] Initializing built-in extension DOUBLE-BUFFER
[    36.565] Initializing built-in extension RECORD
[    36.566] Initializing built-in extension DPMS
[    36.566] Initializing built-in extension X-Resource
[    36.566] Initializing built-in extension XVideo
[    36.567] Initializing built-in extension XVideo-MotionCompensation
[    36.567] Initializing built-in extension SELinux
[    36.567] Initializing built-in extension XFree86-VidModeExtension
[    36.568] Initializing built-in extension XFree86-DGA
[    36.568] Initializing built-in extension XFree86-DRI
[    36.569] Initializing built-in extension DRI2
[    36.569] (II) "glx" will be loaded by default.
[    36.569] (II) LoadModule: "dri2"
[    36.569] (II) Module "dri2" already built-in
[    36.569] (II) LoadModule: "glamoregl"
[    36.571] (WW) Warning, couldn't open module glamoregl
[    36.571] (II) UnloadModule: "glamoregl"
[    36.571] (II) Unloading glamoregl
[    36.571] (EE) Failed to load module "glamoregl" (module does not exist, 0)
[    36.571] (II) LoadModule: "glx"
[    36.571] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[    36.631] (II) Module glx: vendor="NVIDIA Corporation"
[    36.631] 	compiled for 4.0.2, module version = 1.0.0
[    36.631] 	Module class: X.Org Server Extension
[    36.631] (II) NVIDIA GLX Module  331.67  Fri Apr  4 11:43:47 PDT 2014
[    36.631] Loading extension GLX
[    36.631] (II) LoadModule: "nvidia"
[    36.632] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[    36.637] (II) Module nvidia: vendor="NVIDIA Corporation"
[    36.637] 	compiled for 4.0.2, module version = 1.0.0
[    36.637] 	Module class: X.Org Video Driver
[    36.638] (II) NVIDIA dlloader X Driver  331.67  Fri Apr  4 11:24:40 PDT 2014
[    36.638] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[    36.639] (--) using VT number 7

[    36.658] (II) Loading sub module "fb"
[    36.659] (II) LoadModule: "fb"
[    36.659] (II) Loading /usr/lib64/xorg/modules/libfb.so
[    36.660] (II) Module fb: vendor="X.Org Foundation"
[    36.660] 	compiled for 1.13.0, module version = 1.0.0
[    36.660] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    36.660] (WW) Unresolved symbol: fbGetGCPrivateKey
[    36.660] (II) Loading sub module "wfb"
[    36.660] (II) LoadModule: "wfb"
[    36.660] (II) Loading /usr/lib64/xorg/modules/libwfb.so
[    36.661] (II) Module wfb: vendor="X.Org Foundation"
[    36.661] 	compiled for 1.13.0, module version = 1.0.0
[    36.661] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    36.661] (II) Loading sub module "ramdac"
[    36.661] (II) LoadModule: "ramdac"
[    36.661] (II) Module "ramdac" already built-in
[    36.664] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[    36.664] (==) NVIDIA(0): RGB weight 888
[    36.664] (==) NVIDIA(0): Default visual is TrueColor
[    36.664] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[    36.664] (**) NVIDIA(0): Option "TripleBuffer" "1"
[    36.664] (**) NVIDIA(0): Enabling 2D acceleration
[    45.079] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
[    45.079] (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
[    45.079] (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
[    45.079] (EE) NVIDIA(GPU-0):     README for additional information.
[    45.079] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[    45.079] (EE) NVIDIA(0): Failing initialization of X screen 0
[    45.079] (II) UnloadModule: "nvidia"
[    45.079] (II) UnloadSubModule: "wfb"
[    45.079] (II) UnloadSubModule: "fb"
[    45.079] (EE) Screen(s) found, but none have a usable configuration.
[    45.079] 
Fatal server error:
[    45.079] no screens found
[    45.079] (EE) 
Please consult the CentOS support 
	 at http://wiki.centos.org/Documentation
 for help. 
[    45.079] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    45.079] (EE) 
[    45.091] Server terminated with error (1). Closing log file.

I try before ubuntu(5 posts above) and it is same.

NVIDIA please please fix it.

Hello People:

Can anyone help us to get this resolved? I am still having the same problem with the latest kernel and the latest NVidia drivers. I have tried adjusting the “SoftwareRenderCacheSize” option but I am still not having any luck.

Any ideas or thoughts would be much appreciated.

Hi guys,

I’m facing the same issue on my Ubuntu 12.04 with GeForce GTX 750 Ti video card and Xen4.1.
I tried 334.21 and latest 337.25 drivers but no luck.

Have anybody managed to fight this problem? Unfortunately I don’t have any driver version that I can fall back to. 319.49 doesn’t work for me as it is no compatible with GTX 750 Ti.

Please, advice.
Sergey

Hi Sergey:

Thank you for posting. I am still having the same problem and no luck. Like I mentioned, I tried working with the “SoftwareRenderCacheSize” parameter but I was never able to get it working again.

I ended up having to rollback to x86_64 kernel and not use the Xen hypervisor. I’m sad about this because I would rather use the hypervisor instead and it was great that the driver was working with it.

Hopefully there will be enough people that come along and post and somebody might see the forum and have the knowledge to know how to get it fixed.

Not sure this helps, but it might be worth trying: [url]http://wiki.centos.org/HowTos/Xen/NvidiaWithXen[/url].

The Nvidia driver wouldn’t normally install on Xen, but the above work-around might work. I haven’t tried it, but users on the Xen user mailing list confirmed that there is a workaround for Nvidia under dom0.

When will Nvidia fix this issue? For some reason the Nvidia driver doesn’t install under Xen, or refuses to work when detecting Xen. This is why the driver needs to be compiled specifically for Xen. By the way, AMD has no such issue and works fine under Xen.

Hope you get it working.

Hi powerhouse64:

Thank you for the post. Also, thank you for sharing the information about AMD. This is an interesting tidbit.

Yes, using the “IGNORE_XEN_PRESENCE=y” for compiling the module is widely known and it’s the only way to get it working under Xen as far as I know; this is the only way I have ever had any success in getting it running under Xen. This is a known issue. I would bet that all the Xen users posting on this page are installing the NVIDIA driver this way under Xen (using “IGNORE_XEN_PRESENCE=y”). This has been a known issue for a long time. Without it, one cannot even load the module under Xen (a modprobe command will not load the nvidia module and lsmod will not produce any output showing that the nvidia kernel module was loaded). So I am sure that the users on the Xen user mailing list are correct - I have loaded the nvidia kernel module like this for years without ever having any problems.

But it’s important to point out, however, that this isn’t the problem we’re dealing with in this post, and this problem that we’re describing here is completely different. The problem I am experiencing, and the same of the original poster, includes something similar to the following error message:

[   563.644] (EE) NVIDIA(0): Failed to allocate sofware rendering cache surface: out of
[   563.644] (EE) NVIDIA(0):     memory.
[   563.644] (EE) NVIDIA(0):  *** Aborting ***
[   564.091] (EE)
Fatal server error:
[   564.091] (EE) AddScreen/ScreenInit failed for driver 0
[   564.091] (EE)
[   564.091] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[   564.091] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[   564.091] (EE)
[   564.146] (EE) Server terminated with error (1). Closing log file.

It’s not that the driver isn’t being recognized or ignored - it’s definitely getting loaded. But instead, it’s failing with the “sofware” error message (notice that there’s no “T” in software, like other posters have pointed out).

The module definitely gets loaded using the modprobe command (or automatically using modprobe.d); lsmod and the log clearly show that the module get successfully loaded. But when the X server is started, it produces a black screen, and also renders any geTTY consoles unviewable, also with a black screen.

I have also tried varying configurations, making adjustments to the xorg configuration file, and again, I am always ending up with the same result.

I just tried it with one of the newer versions of the Linux kernel, but the same problem continues. I have again rolled back to a kernel that cannot be used for a dom0 under the hypervisor to get video back.

Hopefully we can get some knowledgeable people from NVIDIA to tell us what this means and what should be done for it to get fixed.

I had a similar problem, and as far as I can tell, the trick is to use an older driver rather than a newer one. I am using an 8800GT card, and the most recent driver I managed to get working working without problems is the 319.xx series (latest of those is 319.82). All later drivers result in a Xorg crash/freeze as soon as it starts, before anything is drawn on the screen. The only problem is that the latest kernel 319.xx drivers work with is, IIRC 3.10.x. You will need to patch the driver to make it work on later kernels.

Same problem here. Xubuntu 14.04.1, GeForce GTX 780 Ti, driver 331.38 (the one packaged by Ubuntu). I installed the machine this morning. The graphics work fine when booted natively, but when I reboot with Xen and Linux as Dom0 it fails. As soon as the X server starts, the screen goes blank and none of the text consoles are accessible. It reports the same out-of-memory error message:

...
[    21.757] (II) NVIDIA: Using 3072.00 MB of virtual memory for indirect memory
[    21.757] (II) NVIDIA:     access.
[    22.155] (II) NVIDIA(0): Setting mode "DFP-0:nvidia-auto-select,DFP-3:nvidia-auto-select,DFP-4:nvidia-auto-select"
[    22.311] (EE) NVIDIA(0): Failed to allocate sofware rendering cache surface: out of
[    22.311] (EE) NVIDIA(0):     memory.
[    22.311] (EE) NVIDIA(0):  *** Aborting ***
[    22.393] (EE)
Fatal server error:
[    22.393] (EE) AddScreen/ScreenInit failed for driver 0

I was able to compare this to a log from a successful run when booted without Xen:

...
[  2009.901] (II) NVIDIA: Using 3072.00 MB of virtual memory for indirect memory
[  2009.901] (II) NVIDIA:     access.
[  2010.292] (II) NVIDIA(0): Setting mode "DFP-0:nvidia-auto-select,DFP-3:nvidia-auto-select,DFP-4:nvidia-auto-select"
[  2010.458] Loading extension NV-GLX
[  2010.550] (==) NVIDIA(0): Disabling shared memory pixmaps
[  2010.550] (==) NVIDIA(0): Backing store enabled
[  2010.550] (==) NVIDIA(0): Silken mouse enabled
[  2010.550] (==) NVIDIA(0): DPMS enabled
...

The logs are otherwise basically identical up to that point. Dom0 throws an error while the native boot instead loads NV-GLX and continues setting things up.

Our current Linux driver does not support Xen. This is already a known issue or unsupported configuration.

I’m glad you are aware that there is this huge regression of “no longer working with Xen” in the driver, but it would have been infinitely more helpful to provide information on:

  1. What the last driver that works with Xen is
  2. When you are likely to get it working again

Hi Sandip,

Thank you for the post.

Yes, I think that it’s a known issue that the driver has never really been supported under Xen. As a side note, I think that the developers at NVIDIA should re-think this position and notice that they do have a loyal customer base in the GNU/Linux community. But this is beside the point.

The real point is, despite the fact that the driver hasn’t been supported under Xen, most people, like we have already pointed out, have been able to get it running anyway. But this is a new error message when it worked with previous versions? There’s something that has obviously changed, and it really looks like that there was hasty development because the spelling in the error messages isn’t even correct.

I think that there’s certainly somebody at NVIDIA that could tell us more about this problem and give us a better and perhaps a more technical answer than simply posting that it’s not supported (which we already knew anyway).

Just for the record I also tried driver 340.24 (through the Ubuntu “xorg-edgers” PPA) and it’s got the same problem with the same “Failed to allocate sofware rendering cache surface” message. Hopefully since this is a known issue it will be fixed at some point?

Driver 319.82 does appear to work under Xen Dom0 once a patch is added so that it will compile against a recent kernel, and it does recognize my 780 Ti: [url]http://lists.xen.org/archives/html/xen-devel/2014-07/msg02868.html[/url]. I still lose all my text consoles but at least I get X11, so that’s a start. After a quick test I still have lots of problems with OpenGL applications crashing or misbehaving, but that’s probably due to installing the driver manually because for example it doesn’t update the library paths the way Ubuntu expects.

Hi dododge:

Thank you for sharing and chiming in. Like you, I hope somebody from nVidia can help us to identify why the driver fails and help to get it fixed.

I also noticed that I had a difficult time getting an older driver to work with the newer kernels, and I also had the similar problem with using the older drivers; for example, it would also crash and especially produced some very nasty artifacts (e.g. using compiz fusion/beryl effects in KDE). This is a Gentoo system, so this behavior might not just be confined to Ubuntu. The artifacts were especially nasty - I couldn’t keep running with the older drivers.

Thank you again for the feedback and your post.

Hi,

I have the same problem on Fedora 20 with NVIDIA Quadro FX 880M.
NVIDIA Driver Version: 331.89

[    10.265] (EE) NVIDIA(0): Failed to allocate sofware rendering cache surface: out of
[    10.265] (EE) NVIDIA(0):     memory.
[    10.265] (EE) NVIDIA(0):  *** Aborting ***
[    11.334] (EE) 
[    11.335] (EE) AddScreen/ScreenInit failed for driver 0
[    11.335] (EE) 
[    11.335] (EE) 
[    11.335] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    11.335] (EE) 
[    11.335] (EE) Server terminated with error (1). Closing log file.

Driver 343.13 is also non-functional under Xen Dom0; same problem.

[    14.312] (II) NVIDIA GLX Module  343.13  Thu Jul 31 18:36:09 PDT 2014
...
[    16.842] (EE) NVIDIA(0): Failed to allocate sofware rendering cache surface: out of
[    16.842] (EE) NVIDIA(0):     memory.

FYI: I finally got 319.82 (with the 3.13 kernel patch) packaged and installed “properly” under Xubuntu 14.04, so that the libraries are properly visible to applications. This does let me run the X server under Xen Dom0, but OpenGL applications are still a disaster. glxgears reports ~4 FPS; Unigine Valley brings up its menus but does not display its scenery; the Steam client starts up its main window but chews 100% of a core and is non-responsive to mouse clicks; launching Euro Truck Simulator reconfigures the displays and then segfaults, leaving the desktop in a state that xrandr is unable to fix (but thankfully nvidia-settings can still restore the proper layout). After leaving it screen-locked overnight, when I tried to unlock it one of the screens suddenly went blank and the X server started chewing up CPU time.

If I boot the same system, with the same driver, natively instead of under Xen Dom0, everything seems to be fine. Steam, ETS2, Unigine, glxgears, etc. they all run normally with full acceleration.

So even though the 319.82 driver appears to function under Xen Dom0, there are lots of OpenGL problems that are only present under Xen. It’s probably okay for desktop and development apps, but virtualized OpenGL seems to be a no-go. In my case, the whole point with this machine was to be able to run Linux and Windows with high-speed OpenGL at the same time (using VGA passthrough to give a second card to Windows), so it now looks like I’m going to have to give up on virtualization and just build an entire second machine :-(.

Hi dodoge,

Thank you for reporting your findings. Before you give up, you should try nouveau (cringe). All kidding aside, it’s come a long way in recent times.

I am with you - I also want to have the best of both worlds with xen, guests, and the accelerated 3D but this problem needs to get fixed first. Please notice again that the 319.82 is an older version and not having this new error message. I found the same was true for the 319.49 version that I mentioned in a previous post (which isn’t too far away from the 319.82 you’re testing with here). It’s too bad that nvidia has given this issue little attention.

Anybody ever going to attend to the lack of Xen support, it’s 2017 now?