Keep getting "GPU has fallen off the bus" with 3090 cards on Gigabyte MZ32-AR1 Rev 3.0 motherboard

Thank you for your suggestion. After further testing, nvidia-dkms-575-open also proved to be unstable and having the same issue. However, this time I noticed few things - some hours before falling off the bus, I started getting hardware acceleration errors when trying to run mpv player, it still could play videos though, probably resorting to software acceleration. Also, this time nvidia-bug-report.sh finished without safe mode, so hopefully more debug info was collected.
nvidia-bug-report.log.gz (1.7 MB)

Based on your suggestion to downgrade, I tried to downgrade to 535 and see if it helps (I know you recommended 565 but the issue was still present for me in 550 and 570, so I decided to go further down to see if that helps).

Another interesting find: RTX 3090: GPU has fallen off the bus (only Linux, on Windows everything is fine) - here the user shared experience that RTX 3090 GPU falls off the bus in Linux but not Windows, and Performance mode makes it much more rare compared to Adaptive mode in the PowerMizer, which also matches my observation. This thread however mentions that 535 also has the bug, but I will see how stable it is with Performance mode. I however suspect that Performance mode makes the issue less likely rather than solve it, but only time can tell. Does not change the fact the bug is Nvidia driver and still not fixed even in the latest 575 version.

When this bug happens, it causes all GPUs to fall off the bus, my guess something crashes or gets corrupted in the kernel space because of the Nvidia driver hence all PCI-E cards fall off the bus at once. As mentioned before, I tried replacing motherboard (which was new to begin with), power supply, etc. - all had no effect on the issue at hand. Recent crash with Nvidia driver starting to have weird errors before the crash also points to the Nvidia driver, as well as people having issues only with Linux Nvidia driver but not on Windows with similar symptoms and the same video card as I have.

I hope Nvidia employee can take a look in the debug log, because this issue seriously affect system stability, and I am using server grade motherboard, power supply and online UPS.