NOTE: THIS IS AN IMPORTANT ISSUE AFFECTING ALL USERS WHO WANT TO USE FULLSCREEN APPLICATIONS IN HEADLESS ENVIRONMENTS THROUGH (X)RANDR WITHOUT A MONITOR CONNECTED
Hi,
I am a community developer of the open-source projects https://github.com/selkies-project/docker-nvidia-glx-desktop
and https://github.com/selkies-project/docker-nvidia-egl-desktop
, two projects which creates a working container with full X11 capabilities without any container privileges. This makes NVIDIA GPU GUI VDI environments possible within shared Kubernetes clusters, and thus the projects are in high demand from academic and industrial users.
I have reproduced an issue that all of our users using the 535.86.05 drivers have also faced, where the “NoExtendedGpuCapabilitiesCheck” option in “ModeValidation” for xorg.conf is not honored in GeForce GPUs.
This is a new issue that has arised which did not exist in 530.xx, 525.xx, and any other earlier drivers, and is reproducible in every user using headless setups in GeForce (so far, all of 10xx, 20xx, and 30xx GPUs).
How to reproduce: In a using port with no monitor connected for ConnectedMonitor (e.g. DP-0) to enable XRandR, and use Option “ModeValidation” “NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced” to have the Modes pass the tests.
I have also turned on Option “ModeDebug” “True” for debugging.
Result:
[ 2711.454] (WW) NVIDIA(GPU-0): Validating Mode “1920x1080_60”:
[ 2711.454] (WW) NVIDIA(GPU-0): Mode Source: X Configuration file ModeLine
[ 2711.454] (WW) NVIDIA(GPU-0): 1920 x 1080 @ 60 Hz
[ 2711.454] (WW) NVIDIA(GPU-0): Pixel Clock : 138.50 MHz
[ 2711.454] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 1920, 1968
[ 2711.454] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2000, 2080
[ 2711.454] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1080, 1083
[ 2711.454] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1088, 1111
[ 2711.454] (WW) NVIDIA(GPU-0): Sync Polarity : +H -V
[ 2711.454] (WW) NVIDIA(GPU-0): DualHead Mode: No
[ 2711.454] (WW) NVIDIA(GPU-0): Viewport
[ 2711.454] (WW) NVIDIA(GPU-0): Horizontal Taps
[ 2711.454] (WW) NVIDIA(GPU-0): Vertical Taps
[ 2711.454] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[ 2711.454] (WW) NVIDIA(GPU-0): Mode “1920x1080_60” is invalid.
[ 2711.454] (WW) NVIDIA(GPU-0):
Logs available with Xorg.0.log and xorg.conf.log.
This is a behavior which does not coincide with the README documentation, and therefore has to be fixed.
On a separate note, there is a separate issue which is not a blocking issue (existed long before NVIDIA 535 drivers), where the HDMI or DVI (including the virtual DVI ports in supported Tesla/Datacenter GPUs where the maximum resolution is stuck at a maximum of 2560 x 1600 at 60 hz) ports are stuck at 165.0 MHz maximum pixel clock, and the “NoMaxPClkCheck” “ModeValidation” and related options are never honored. This makes headless GPUs with a “ConnectedMonitor” option on an HDMI or DVI port not able to use Modes above 1920x1200 at 60 hz resolutions.
[2363014.704] (–) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:33:0:0
[2363014.704] (–) NVIDIA(0): DFP-0
[2363014.704] (–) NVIDIA(0): DFP-1
[2363014.704] (–) NVIDIA(0): DFP-2
[2363014.704] (–) NVIDIA(0): DFP-3
[2363014.704] (–) NVIDIA(0): DFP-4
[2363014.704] (–) NVIDIA(0): DFP-5
[2363014.705] (**) NVIDIA(0): Using ConnectedMonitor string “DFP-0”.
[2363014.707] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 3090 (GA102-A) at PCI:33:0:0
[2363014.707] (II) NVIDIA(0): (GPU-0)
[2363014.707] (–) NVIDIA(0): Memory: 25165824 kBytes
[2363014.707] (–) NVIDIA(0): VideoBIOS: 94.02.42.40.34
[2363014.707] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[2363014.711] (–) NVIDIA(GPU-0): DFP-0: connected
[2363014.711] (–) NVIDIA(GPU-0): DFP-0: Internal TMDS
[2363014.711] (–) NVIDIA(GPU-0): DFP-0 Name Aliases:
[2363014.711] (–) NVIDIA(GPU-0): DFP
[2363014.711] (–) NVIDIA(GPU-0): DFP-0
[2363014.711] (–) NVIDIA(GPU-0): DPY-0
[2363014.711] (–) NVIDIA(GPU-0): HDMI-0
[2363014.712] (–) NVIDIA(GPU-0): HDMI-0
[2363014.712] (–) NVIDIA(GPU-0): Connector-3
[2363014.712] (–) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
[2363014.712] (–) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0): Validating Mode “1920x1440_60”:
[2363014.714] (WW) NVIDIA(GPU-0): Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0): 1920 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0): Pixel Clock : 234.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 1920, 2048
[2363014.714] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2256, 2600
[2363014.714] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0): Sync Polarity : -H +V
[2363014.714] (WW) NVIDIA(GPU-0): Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0): mode timings.
[2363014.714] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0): Mode “1920x1440_60” is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0): Validating Mode “1920x1440_75”:
[2363014.714] (WW) NVIDIA(GPU-0): Mode Source: VESA
[2363014.714] (WW) NVIDIA(GPU-0): 1920 x 1440 @ 75 Hz
[2363014.714] (WW) NVIDIA(GPU-0): Pixel Clock : 297.00 MHz
[2363014.714] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 1920, 2064
[2363014.714] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2288, 2640
[2363014.714] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1440, 1441
[2363014.714] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1444, 1500
[2363014.714] (WW) NVIDIA(GPU-0): Sync Polarity : -H +V
[2363014.714] (WW) NVIDIA(GPU-0): Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0): mode timings.
[2363014.714] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0): Mode “1920x1440_75” is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
[2363014.714] (WW) NVIDIA(GPU-0): Validating Mode “2560x1440_60”:
[2363014.714] (WW) NVIDIA(GPU-0): Mode Source: X Configuration file ModeLine
[2363014.714] (WW) NVIDIA(GPU-0): 2560 x 1440 @ 60 Hz
[2363014.714] (WW) NVIDIA(GPU-0): Pixel Clock : 241.50 MHz
[2363014.714] (WW) NVIDIA(GPU-0): HRes, HSyncStart : 2560, 2608
[2363014.714] (WW) NVIDIA(GPU-0): HSyncEnd, HTotal : 2640, 2720
[2363014.714] (WW) NVIDIA(GPU-0): VRes, VSyncStart : 1440, 1443
[2363014.714] (WW) NVIDIA(GPU-0): VSyncEnd, VTotal : 1448, 1481
[2363014.714] (WW) NVIDIA(GPU-0): Sync Polarity : +H -V
[2363014.714] (WW) NVIDIA(GPU-0): Mode is rejected: Unable to construct hardware-specific
[2363014.714] (WW) NVIDIA(GPU-0): mode timings.
[2363014.714] (WW) NVIDIA(GPU-0): GPU extended capability check failed.
[2363014.714] (WW) NVIDIA(GPU-0): Mode “2560x1440_60” is invalid.
[2363014.714] (WW) NVIDIA(GPU-0):
Logs available with Xorg.0_HDMI_525.log and xorg.conf_HDMI_525.log.
The same phenomena happened with versions 525.60.13 and any other modern driver version since 450.xx and before.
This separate note also does not coincide with the README documentation, this time originating way before the 535.xx drivers.
Overall, the “ModeValidation” options seem broken, and obtaining XRandR through “ConnectedMonitor” in a headless setup is no longer possible.
My immense thanks for maintaining the Linux drivers and publishing good documentation each release.
While I did not include the nvidia-bug-report.log.gz file because this issue is reproducible in multiple users and is an Xorg driver issue, I may upload it if needed.
Xorg.0.log (224.4 KB)
Xorg.0_HDMI_525.log (242.2 KB)
xorg.conf.log (2.0 KB)
xorg.conf_HDMI_525.log (2.1 KB)