Linux 535.xx does not honor ModeValidation, making headless Fullscreen RANDR usage with ConnectedMonitor impossible

NOTE: THIS IS AN IMPORTANT ISSUE AFFECTING ALL USERS WHO WANT TO USE FULLSCREEN APPLICATIONS IN HEADLESS ENVIRONMENTS THROUGH (X)RANDR WITHOUT A MONITOR CONNECTED

Hi,
I am a community developer of the open-source projects https://github.com/selkies-project/docker-nvidia-glx-desktop and https://github.com/selkies-project/docker-nvidia-egl-desktop, two projects which creates a working container with full X11 capabilities without any container privileges. This makes NVIDIA GPU GUI VDI environments possible within shared Kubernetes clusters, and thus the projects are in high demand from academic and industrial users.


I have reproduced an issue that all of our users using the 535.86.05 drivers have also faced, where the “NoExtendedGpuCapabilitiesCheck” option in “ModeValidation” for xorg.conf is not honored in GeForce GPUs.

This is a new issue that has arised which did not exist in 530.xx, 525.xx, and any other earlier drivers, and is reproducible in every user using headless setups in GeForce (so far, all of 10xx, 20xx, and 30xx GPUs).

How to reproduce: In a using port with no monitor connected for ConnectedMonitor (e.g. DP-0) to enable XRandR, and use Option “ModeValidation” “NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced” to have the Modes pass the tests.

I have also turned on Option “ModeDebug” “True” for debugging.

Result:

[ 2711.454] (WW) NVIDIA(GPU-0): Validating Mode “1920x1080_60”:
[ 2711.454] (WW) NVIDIA(GPU-0): Mode Source: X Configuration file ModeLine
[ 2711.454] (WW) NVIDIA(GPU-0): 1920 x 1080 @ 60 Hz
[ 2711.454] (WW) NVIDIA(GPU-0): Pixel Clock : 138.50 MHz
[ 2711.454] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 1920, 1968
[ 2711.454] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2000, 2080
[ 2711.454] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1080, 1083
[ 2711.454] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1088, 1111
[ 2711.454] (WW) NVIDIA(GPU-0): Sync Polarity : +H -V
[ 2711.454] (WW) NVIDIA(GPU-0): DualHead Mode: No
[ 2711.454] (WW) NVIDIA(GPU-0): Viewport
[ 2711.454] (WW) NVIDIA(GPU-0): Horizontal Taps
[ 2711.454] (WW) NVIDIA(GPU-0): Vertical Taps
[ 2711.454] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[ 2711.454] (WW) NVIDIA(GPU-0): Mode “1920x1080_60” is invalid.
[ 2711.454] (WW) NVIDIA(GPU-0):

Logs available with Xorg.0.log and xorg.conf.log.

This is a behavior which does not coincide with the README documentation, and therefore has to be fixed.


On a separate note, there is a separate issue which is not a blocking issue (existed long before NVIDIA 535 drivers), where the HDMI or DVI (including the virtual DVI ports in supported Tesla/Datacenter GPUs where the maximum resolution is stuck at a maximum of 2560 x 1600 at 60 hz) ports are stuck at 165.0 MHz maximum pixel clock, and the “NoMaxPClkCheck” “ModeValidation” and related options are never honored. This makes headless GPUs with a “ConnectedMonitor” option on an HDMI or DVI port not able to use Modes above 1920x1200 at 60 hz resolutions.

[2363014.704] (–) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:33:0:0
[2363014.704] (–) NVIDIA(0): DFP-0
[2363014.704] (–) NVIDIA(0): DFP-1
[2363014.704] (–) NVIDIA(0): DFP-2
[2363014.704] (–) NVIDIA(0): DFP-3
[2363014.704] (–) NVIDIA(0): DFP-4
[2363014.704] (–) NVIDIA(0): DFP-5
[2363014.705] (**) NVIDIA(0): Using ConnectedMonitor string “DFP-0”.
[2363014.707] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 3090 (GA102-A) at PCI:33:0:0
[2363014.707] (II) NVIDIA(0): (GPU-0)
[2363014.707] (–) NVIDIA(0): Memory: 25165824 kBytes
[2363014.707] (–) NVIDIA(0): VideoBIOS: 94.02.42.40.34
[2363014.707] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[2363014.711] (–) NVIDIA(GPU-0): DFP-0: connected
[2363014.711] (–) NVIDIA(GPU-0): DFP-0: Internal TMDS
[2363014.711] (–) NVIDIA(GPU-0): DFP-0 Name Aliases:
[2363014.711] (–) NVIDIA(GPU-0): DFP
[2363014.711] (–) NVIDIA(GPU-0): DFP-0
[2363014.711] (–) NVIDIA(GPU-0): DPY-0
[2363014.711] (–) NVIDIA(GPU-0): HDMI-0
[2363014.712] (–) NVIDIA(GPU-0): HDMI-0
[2363014.712] (–) NVIDIA(GPU-0): Connector-3
[2363014.712] (–) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
[2363014.712] (–) NVIDIA(GPU-0):

[2363014.714] (WW) NVIDIA(GPU-0): Validating Mode “1920x1440_60”:
[2363014.714] (WW) NVIDIA(GPU-0): Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0): 1920 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0): Pixel Clock : 234.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 1920, 2048
[2363014.714] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2256, 2600
[2363014.714] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0): Sync Polarity : -H +V
[2363014.714] (WW) NVIDIA(GPU-0): Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0): mode timings.
[2363014.714] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0): Mode “1920x1440_60” is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0): Validating Mode “1920x1440_75”:
[2363014.714] (WW) NVIDIA(GPU-0): Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0): 1920 x 1440 @ 75 Hz
[2363014.714] (WW) NVIDIA(GPU-0): Pixel Clock : 297.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 1920, 2064
[2363014.714] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2288, 2640
[2363014.714] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0): Sync Polarity : -H +V
[2363014.714] (WW) NVIDIA(GPU-0): Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0): mode timings.
[2363014.714] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0): Mode “1920x1440_75” is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0): Validating Mode “2560x1440_60”:
[2363014.714] (WW) NVIDIA(GPU-0): Mode Source: X Configuration file ModeLine
[2363014.714] (WW) NVIDIA(GPU-0): 2560 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0): Pixel Clock : 241.50 MHz
[2363014.714] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 2560, 2608
[2363014.714] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2640, 2720
[2363014.714] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1440, 1443
[2363014.714] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1448, 1481
[2363014.714] (WW) NVIDIA(GPU-0): Sync Polarity : +H -V
[2363014.714] (WW) NVIDIA(GPU-0): Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0): mode timings.
[2363014.714] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0): Mode “2560x1440_60” is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):

Logs available with Xorg.0_HDMI_525.log and xorg.conf_HDMI_525.log.
The same phenomena happened with versions 525.60.13 and any other modern driver version since 450.xx and before.

This separate note also does not coincide with the README documentation, this time originating way before the 535.xx drivers.


Overall, the “ModeValidation” options seem broken, and obtaining XRandR through “ConnectedMonitor” in a headless setup is no longer possible.

My immense thanks for maintaining the Linux drivers and publishing good documentation each release.

While I did not include the nvidia-bug-report.log.gz file because this issue is reproducible in multiple users and is an Xorg driver issue, I may upload it if needed.

Xorg.0.log (224.4 KB)
Xorg.0_HDMI_525.log (242.2 KB)
xorg.conf.log (2.0 KB)
xorg.conf_HDMI_525.log (2.1 KB)

More related discussion: NVIDIA 535.86 doesn't run headless Xorg servers · Issue #41 · selkies-project/docker-nvidia-glx-desktop · GitHub

@amrits I believe this is worth an internal bug filing.

@ehfd
I have filed a bug 4260425 internally for tracking purpose.
Shall try to duplicate issue locally and will get back to you again if needed any additional information.

1 Like

Sorry to take your time, but is there any progress upon this? @amrits

It seems that only the Datacenter GPUs (A10) work.

I’m having the same problem on nVidia Orin Jetson. Spent all day trying to get it working only to end up here after realizing the option wasn’t working.

[  3199.428] (WW) NVIDIA(GPU-0):   Validating Mode "2880x1800_60":
[  3199.428] (WW) NVIDIA(GPU-0):     Mode Source: X Configuration file ModeLine
[  3199.428] (WW) NVIDIA(GPU-0):     2880 x 1800 @ 60 Hz
[  3199.428] (WW) NVIDIA(GPU-0):       Pixel Clock      : 442.00 MHz
[  3199.428] (WW) NVIDIA(GPU-0):       HRes, HSyncStart : 2880, 3104
[  3199.428] (WW) NVIDIA(GPU-0):       HSyncEnd, HTotal : 3416, 3952
[  3199.428] (WW) NVIDIA(GPU-0):       VRes, VSyncStart : 1800, 1803
[  3199.428] (WW) NVIDIA(GPU-0):       VSyncEnd, VTotal : 1809, 1865
[  3199.428] (WW) NVIDIA(GPU-0):       Sync Polarity    : -H +V
[  3199.428] (WW) NVIDIA(GPU-0):     Mode is rejected: Unable to construct hardware-specific
[  3199.428] (WW) NVIDIA(GPU-0):     mode timings.
[  3199.428] (WW) NVIDIA(GPU-0):     GPU extended capability check failed.
[  3199.428] (WW) NVIDIA(GPU-0):     Mode "2880x1800_60" is invalid.

This seems to be related to the fact that when a monitor is marked with “ConnectedMonitor” the maximum pixel clock rate is reduced to HDMI 1.0’s 165.0 MHz and the driver subsequently seems to think it will need to use “Display Stream Compression” in order to transmit 4K images. When the card doesn’t offer DSC, this triggers the “extended capability check failed” error.

It’s unclear to me why the driver would enforce dropping the maximum clock rate all the way to HDMI 1.0’s 165.0 Mhz. At the very least, I would think there be an option to set the maximum clock rate for the ConnectedMonitor rather than simply assume the monitor is only capable of HDMI 1.0. Obviously, the overrides would normally be able to deal with this, but at the same time, dropping the maximum pixel rate this low means everyone trying to use the “ConnectedMonitor” options has to figure out on their own that these two options are also required before they’re able to get high resolutions working. If this is truly the intended behavior, perhaps a note in the documentation explaining this behavior under ConnectedMonitor would be worthwhile?

Thanks.

Do lower resolutions work for you, or does it not at all?

At the desktop GPU side, even basic resolutions don’t work in 535.xx.

Yes, lower resolutions work fine on the Jetson Nano, it’s just higher resolutions that won’t work.

Section "Module"
    Disable     "dri"
    SubSection  "extmod"
        Option  "omit xfree86-dga"
    EndSubSection
EndSection

Section "Device"
    Identifier  "Tegra0"
    Driver      "nvidia"
    Option      "ModeValidation" "NoExtendedGpuCapabilitiesCheck,NoVesaModes,NoXServerModes,NoPredefinedModes,AllowNonEdidModes"
    Option      "ModeDebug" "True"
EndSection

Section "Monitor"
    Identifier  "AlwaysOnMonitor"
#    Modeline    "3840x2160"        712.75  3840 4160 4576 5312 2160 2163 2168 2237 -Hsync +Vsync    #NotWorking
#    Modeline    "2880x1800"        442.00  2880 3104 3416 3952 1800 1803 1809 1865 -Hsync +Vsync    #NotWorking
    ModeLine    "1920x1080_60.00"  148.50  1920 2008 2052 2200  1080 1084 1089 1125 +HSync +VSync
    ModeLine    "1600x1200_60.00"  162.00  1600 1664 1856 2160  1200 1201 1204 1250 +HSync +VSync
    ModeLine    "1280x1024_60.02"  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +HSync +VSync
    ModeLine    "1024x768_60.00"    65.00  1024 1048 1184 1344  768 771 777 806 -HSync -VSync
    HorizSync   48-134
EndSection

Section "Screen"
    Identifier  "Jeston"
    Device      "Tegra0" 
    Monitor     "AlwaysOnMonitor"
    SubSection  "Display"
       Depth    24
       Modes    "1920x1080_60.00"
    EndSubSection
    Option      "ConnectedMonitor" "DFP-1"
EndSection

We have root caused the issue, will integrate fix in future releases.
Thanks for being patient.

1 Like

Thanks for the news. This would save my open-source project and hundreds of users relying on it.

Did both issues (the critical issue respectively affecting only 535.xx, and the less critical one also affecting all drivers before that) get resolved?

Related: Driver fails to validate 3440x1440@160Hz on Linux, works on Windows

@ehfd
Yes, both issues have been fixed and will be integrated in future release drivers.

1 Like

Users report that this was fixed in 535.129.03 and 545.29.02.
Will post again if there are remaining issues.

Thank you, NVIDIA!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.