On a render node with 4 GTX Titan RTX GPUs running Debian 10 I want to have one X screen per GPU, so I can use it with MPI-based render software using OpenGL (ParaView server mode, with each process taking a different --display :O.<screen>
argument). The problem I’m facing is that the X server gets configured to have 1 regular Device
and 3 GPUDevice
’s and the latter ones are apparently not capable of becoming X screens:
[346791.563]
X.Org X Server 1.20.4
X Protocol Version 11, Revision 0
[346791.563] Build Operating System: Linux 4.19.0-10-amd64 x86_64 Debian
[346791.563] Current Operating System: Linux r29n5.lisa.surfsara.nl 4.19.0-12-amd64 #1 SMP Debian 4.19.152-1 (2020-10-18) x86_64
[346791.563] Kernel command line: BOOT_IMAGE=/vmlinuz root=LABEL=root ro
[346791.563] Build Date: 27 August 2020 08:51:48AM
[346791.563] xorg-server 2:1.20.4-1+deb10u1 (https://www.debian.org/support)
[346791.563] Current version of pixman: 0.36.0
[346791.563] Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
[346791.563] Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[346791.563] (==) Log file: "/var/log/Xorg.0.log", Time: Tue Oct 27 14:21:08 2020
[346791.563] (==) Using config file: "/etc/X11/xorg.conf"
[346791.563] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[346791.564] (==) ServerLayout "Layout0"
[346791.564] (**) |-->Screen "Screen0" (0)
[346791.564] (**) | |-->Monitor "Monitor0"
[346791.565] (**) | |-->Device "Device0"
[346791.565] (**) | |-->GPUDevice "Device1"
[346791.565] (**) | |-->GPUDevice "Device2"
[346791.565] (**) | |-->GPUDevice "Device3"
...
[346791.627] (II) NVIDIA GLX Module 450.80.02 Wed Sep 23 00:51:32 UTC 2020
[346791.627] (II) NVIDIA: The X server does not support PRIME Render Offload.
[346791.630] (II) NVIDIA(0): NVIDIA GPU TITAN RTX (TU102-A) at PCI:59:0:0 (GPU-0)
[346791.630] (--) NVIDIA(0): Memory: 25165824 kBytes
[346791.630] (--) NVIDIA(0): VideoBIOS: 90.02.2e.00.0c
...
[346791.630] (EE) NVIDIA(G0): GPU screens are disabled
[346791.630] (EE) NVIDIA(G0): Failing initialization of X screen
[346791.630] (**) NVIDIA(G1): Depth 24, (--) framebuffer bpp 32
[346791.630] (==) NVIDIA(G1): RGB weight 888
[346791.630] (==) NVIDIA(G1): Default visual is TrueColor
[346791.630] (==) NVIDIA(G1): Using gamma correction (1.0, 1.0, 1.0)
[346791.630] (**) NVIDIA(G1): Enabling 2D acceleration
[346791.630] (EE) NVIDIA(G1): GPU screens are disabled
[346791.630] (EE) NVIDIA(G1): Failing initialization of X screen
[346791.630] (**) NVIDIA(G2): Depth 24, (--) framebuffer bpp 32
[346791.630] (==) NVIDIA(G2): RGB weight 888
[346791.630] (==) NVIDIA(G2): Default visual is TrueColor
[346791.630] (==) NVIDIA(G2): Using gamma correction (1.0, 1.0, 1.0)
[346791.630] (**) NVIDIA(G2): Enabling 2D acceleration
[346791.630] (EE) NVIDIA(G2): GPU screens are disabled
[346791.630] (EE) NVIDIA(G2): Failing initialization of X screen
I actually don’t want GPU Screens (nor PRIME rendering offloading), I want regular X screens. According to Appendix B. X Config Options the option AllowNVIDIAGPUScreens
is a boolean, but I can’t seem to get the X server to listen to it when it is set to false
or no
. In fact, the option doesn’t seem to get picked up by the nvidia driver:
[346791.615] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[346791.615] (**) Option "AllowNVIDIAGpuScreens" "false"
[346791.615] (**) NVIDIA(0): Option "ConnectedMonitor" "DFP-0"
I checked the system-wide config files in /usr/share/X11/xorg.conf.d
, but nothing relevant seems to be specified there.
Also strange is that the xconfig link above mentions that “The NVIDIA X driver will allow GPU screens on X.Org xserver version 1.20.7 and higher”, but we’re running 1.20.4. Plus, it’s reported in the X log output that PRIME Render Offload is not even supported by the X server.
Relevant excerpt from the xorg.conf:
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
Option "AllowNVIDIAGPUScreens" "false"
EndSection
...
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:59:0:0"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:94:0:0"
EndSection
Section "Device"
Identifier "Device2"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:177:0:0"
EndSection
Section "Device"
Identifier "Device3"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:217:0:0"
EndSection
...
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "UseDisplayDevice" "none"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Section "Screen"
Identifier "Screen1"
Device "Device1"
Monitor "Monitor1"
DefaultDepth 24
Option "UseDisplayDevice" "none"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Section "Screen"
Identifier "Screen2"
Device "Device2"
Monitor "Monitor2"
DefaultDepth 24
Option "UseDisplayDevice" "none"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Section "Screen"
Identifier "Screen3"
Device "Device3"
Monitor "Monitor3"
DefaultDepth 24
Option "UseDisplayDevice" "none"
Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Configuring the X server used to work with earlier driver versions, but it seems the recent PRIME support and relevant X options make this hard now. We also have another cluster with GPU nodes (2 Tesla K40m per node, NVIDIA driver 455.23.05) where the almost exact same X config (2 GPU devices instead of 4) does not lead to a GPUDevice
being created, but simply 2 regular Device
’s, which then are used as 2 X screens. I really don’t understand why that is. Are the K40m’s too old to support PRIME and are therefore not enabled as GPU Screens?
Edit: attach nvidia-bug-report.log.gz (2.8 MB)