[SOLVED] Want one X screen per GPU, how to disable usage as GPUDevice (xorg 1.20.4, driver 450.80.02)

On a render node with 4 GTX Titan RTX GPUs running Debian 10 I want to have one X screen per GPU, so I can use it with MPI-based render software using OpenGL (ParaView server mode, with each process taking a different --display :O.<screen> argument). The problem I’m facing is that the X server gets configured to have 1 regular Device and 3 GPUDevice's and the latter ones are apparently not capable of becoming X screens:

[346791.563] 
X.Org X Server 1.20.4
X Protocol Version 11, Revision 0
[346791.563] Build Operating System: Linux 4.19.0-10-amd64 x86_64 Debian
[346791.563] Current Operating System: Linux r29n5.lisa.surfsara.nl 4.19.0-12-amd64 #1 SMP Debian 4.19.152-1 (2020-10-18) x86_64
[346791.563] Kernel command line: BOOT_IMAGE=/vmlinuz root=LABEL=root ro
[346791.563] Build Date: 27 August 2020  08:51:48AM
[346791.563] xorg-server 2:1.20.4-1+deb10u1 (https://www.debian.org/support) 
[346791.563] Current version of pixman: 0.36.0
[346791.563]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[346791.563] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[346791.563] (==) Log file: "/var/log/Xorg.0.log", Time: Tue Oct 27 14:21:08 2020
[346791.563] (==) Using config file: "/etc/X11/xorg.conf"
[346791.563] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[346791.564] (==) ServerLayout "Layout0"
[346791.564] (**) |-->Screen "Screen0" (0)
[346791.564] (**) |   |-->Monitor "Monitor0"
[346791.565] (**) |   |-->Device "Device0"
[346791.565] (**) |   |-->GPUDevice "Device1"
[346791.565] (**) |   |-->GPUDevice "Device2"
[346791.565] (**) |   |-->GPUDevice "Device3"
...
[346791.627] (II) NVIDIA GLX Module  450.80.02  Wed Sep 23 00:51:32 UTC 2020
[346791.627] (II) NVIDIA: The X server does not support PRIME Render Offload.
[346791.630] (II) NVIDIA(0): NVIDIA GPU TITAN RTX (TU102-A) at PCI:59:0:0 (GPU-0)
[346791.630] (--) NVIDIA(0): Memory: 25165824 kBytes
[346791.630] (--) NVIDIA(0): VideoBIOS: 90.02.2e.00.0c
...
[346791.630] (EE) NVIDIA(G0): GPU screens are disabled
[346791.630] (EE) NVIDIA(G0): Failing initialization of X screen
[346791.630] (**) NVIDIA(G1): Depth 24, (--) framebuffer bpp 32
[346791.630] (==) NVIDIA(G1): RGB weight 888
[346791.630] (==) NVIDIA(G1): Default visual is TrueColor
[346791.630] (==) NVIDIA(G1): Using gamma correction (1.0, 1.0, 1.0)
[346791.630] (**) NVIDIA(G1): Enabling 2D acceleration
[346791.630] (EE) NVIDIA(G1): GPU screens are disabled
[346791.630] (EE) NVIDIA(G1): Failing initialization of X screen
[346791.630] (**) NVIDIA(G2): Depth 24, (--) framebuffer bpp 32
[346791.630] (==) NVIDIA(G2): RGB weight 888
[346791.630] (==) NVIDIA(G2): Default visual is TrueColor
[346791.630] (==) NVIDIA(G2): Using gamma correction (1.0, 1.0, 1.0)
[346791.630] (**) NVIDIA(G2): Enabling 2D acceleration
[346791.630] (EE) NVIDIA(G2): GPU screens are disabled
[346791.630] (EE) NVIDIA(G2): Failing initialization of X screen

I actually don’t want GPU Screens (nor PRIME rendering offloading), I want regular X screens. According to https://download.nvidia.com/XFree86/Linux-x86_64/450.80.02/README/xconfigoptions.html the option AllowNVIDIAGPUScreens is a boolean, but I can’t seem to get the X server to listen to it when it is set to false or no. In fact, the option doesn’t seem to get picked up by the nvidia driver:

[346791.615] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[346791.615] (**) Option "AllowNVIDIAGpuScreens" "false"
[346791.615] (**) NVIDIA(0): Option "ConnectedMonitor" "DFP-0"

I checked the system-wide config files in /usr/share/X11/xorg.conf.d, but nothing relevant seems to be specified there.

Also strange is that the xconfig link above mentions that “The NVIDIA X driver will allow GPU screens on X.Org xserver version 1.20.7 and higher”, but we’re running 1.20.4. Plus, it’s reported in the X log output that PRIME Render Offload is not even supported by the X server.

Relevant excerpt from the xorg.conf:

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option "AllowNVIDIAGPUScreens" "false"
EndSection
...
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:59:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:94:0:0"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:177:0:0"
EndSection

Section "Device"
    Identifier     "Device3"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:217:0:0"
EndSection
...
Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "UseDisplayDevice" "none"
    Option         "ConnectedMonitor" "DFP-0"
    Option         "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "UseDisplayDevice" "none"
    Option         "ConnectedMonitor" "DFP-0"
    Option         "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen2"
    Device         "Device2"
    Monitor        "Monitor2"
    DefaultDepth    24
    Option         "UseDisplayDevice" "none"
    Option         "ConnectedMonitor" "DFP-0"
    Option         "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen3"
    Device         "Device3"
    Monitor        "Monitor3"
    DefaultDepth    24
    Option         "UseDisplayDevice" "none"
    Option         "ConnectedMonitor" "DFP-0"
    Option         "CustomEDID" "DFP-0: /etc/X11/dell-3008wfp.bin"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Configuring the X server used to work with earlier driver versions, but it seems the recent PRIME support and relevant X options make this hard now. We also have another cluster with GPU nodes (2 Tesla K40m per node, NVIDIA driver 455.23.05) where the almost exact same X config (2 GPU devices instead of 4) does not lead to a GPUDevice being created, but simply 2 regular Device's, which then are used as 2 X screens. I really don’t understand why that is. Are the K40m’s too old to support PRIME and are therefore not enabled as GPU Screens?

Edit: attach nvidia-bug-report.log.gz (2.8 MB)

Doh! I’m an idiot, only 1 screen is configured in the ServerLayout…

1 Like