Continued pain with getting an x configuration which works with ampere cards (black screen on boot) + Ubuntu 20.04

Hi,
I have been pulling my hair out just trying to get X to work properly on my machine. I had it working for a while but something changed and now I am at the point of nearly giving up on setting up the configuration.

I have 2 machines, one which I magically was able to get working with the same display and the other which doesn’t work (on boot, i get no X and my display turns off). If i ctrl-alt-2 to get to another terminal i just get the blinking cursor (SSH works fine).

I have tried all the different options I have found online, but nothing seems to work (nomodeset, using xorg.conf.d instead of the default xorg.conf file, stripping out all but one GPU and regenerating nvidia-xconfig, xrandr --auto in ~/.xinitrc, switching to lightdm, force reinstalling gdm3, force reinstalling nvidia-driver-460, blacklisting nouveau).

systemctl status gdm3 shows it’s fine:

● gdm.service - GNOME Display Manager
Loaded: loaded (/lib/systemd/system/gdm.service; static; vendor preset: enabled)
Active: active (running) since Fri 2021-06-25 22:27:41 BST; 9min ago
Process: 1655 ExecStartPre=/usr/share/gdm/generate-config (code=exited, status=0/SUCCESS)
Process: 1671 ExecStartPre=/usr/lib/gdm3/gdm-wait-for-drm (code=exited, status=0/SUCCESS)
Main PID: 1672 (gdm3)
Tasks: 3 (limit: 231646)
Memory: 9.1M
CGroup: /system.slice/gdm.service
└─1672 /usr/sbin/gdm3

the only thing I noticed different between when i plug into one machine vs the other seems to be autdetection of the display, where i am able to find this on the working machine:

/usr/lib/gdm3/gdm-x-session[2813]: (==) NVIDIA(0): No modes were requested; the default m>
/usr/lib/gdm3/gdm-x-session[2813]: (==) NVIDIA(0): will be used as the requested mode.
/usr/lib/gdm3/gdm-x-session[2813]: (==) NVIDIA(0):
/usr/lib/gdm3/gdm-x-session[2813]: (II) NVIDIA(0): Validated MetaModes:
/usr/lib/gdm3/gdm-x-session[2813]: (II) NVIDIA(0): “DFP-7:nvidia-auto-select”
/usr/lib/gdm3/gdm-x-session[2813]: (II) NVIDIA(0): Virtual screen size determined to be 3>
/usr/lib/gdm3/gdm-x-session[2813]: (–) NVIDIA(0): DPI set to (325, 211); computed from ">
/usr/lib/gdm3/gdm-x-session[2813]: (–) NVIDIA(0): option

Where on the non-working machine all i get is:

/usr/lib/gdm3/gdm-x-session[1824]: (II) NVIDIA(0): Validated MetaModes:
/usr/lib/gdm3/gdm-x-session[1824]: (II) NVIDIA(0): “NULL”

On the working machine, i had a hard time getting it up and running as well, but ultimately fixed by having the “Device” section for the device with the display first as Display0, but that didn’t make a difference in this case.

Attaching my nvidia-bug-report.sh in case that helps as well.
Any help figuring this out would be greatly appreciated.

nvidia-bug-report.log.gz (394.5 KB)

Does the screen work at boot and then it shuts off when the X server starts? The driver isn’t detecting any connected display devices which is awfully strange if the screen was working at boot.

Do you see the same symptoms if you plug the monitor into a different port on the GPU? If so, do you have another known-working cable you could try?

Oh, and would it be possible to test a newer driver such as 465.31 or the 470.42.01 beta release?

The screen works fine for the bios and through the initial bootup, it’s only the X-server stage that doesn’t work.
nvidia-smi works fine as well and detects all the GPUs no problem.

I have tried swapping the ports and the cables. On my case, i can’t get the Displayport → HDMI adapter to work on the left 2 ports on the first gpu, so i haven’t been able to test those unfortunately.

That said, the cable and adapter work on a different machine that I have setup, so at least the cable and adapter don’t appear to be the issue.

Will give the new drivers a try to see if that helps at all.

So it seems to have resolved itself at this point, though I didn’t really do much, re-installed the hardware, purged and reinstalled the drivers and things seems to be working again.