Blank Screen at boot 520.61.05 ACPI BIOS Error RHEL 8 Alienware R13 RTX 3070

NVIDIA Corporation GA104 [GeForce RTX 3070 Lite Hash Rate] (rev a1)
Rocky Linux 8.6 (RHEL 8.6 binary compatiable)
Using HPC drivers from:

Alienware R13 with latest 1.7.0 BIOS.

Detailed description:
Previous kernel 4.18.0-372.19.1 worked, however, now on 4.18.0-372.26.1 the nvidia driver doesn’t work with the precompiled nvidia kmod.

Upon reboot with GUI the computer just arrives at a black screen. I’m unable to Ctrl+Alt+F2, etc… to get a terminal when this occurs. When I boot the computer to a multi-user terminal I obtained the nvidia bug report attached. Additionally, when executing the nvidia-smi some error message appear on the screen. I’ve captured these errors in the attached nvidia-dmesg attachment and you can see the nvidia-smi.error showing the command responding with typical view after those error messages are displayed on the screen.

Adding nomodeset at boot doesn’t help with the GUI loading.

Uninstalling the nvidia driver is the only thing that allows the GUI to display and function as normal. Computer has the latest BIOS and I’ve used the HPC driver on many computers without this issue. Any help is appreciated, thanks.

nvidia-bug-report.log.gz (560.0 KB)
nvidia-dmesg (113.6 KB)
nvidia-smi.error (1.5 KB)

To make sure this log file includes as much relevant information as possible, please start the X server with `startx -- -logverbose 6` and run `nvidia-bug-report.sh` after the problem has occurred. If X can not be started or the machine appears to have crashed, please check if you can log into it remotely (e.g. via ssh) and run `nvidia-bug-report.sh` in the remote shell, if possible.

Inside a multi-user.target boot up I ran startx – -logverbose 6 and nvidia-bug-report.sh

$ systemctl set-default multi-user.target
$ reboot
$ startx -- -logverbose 6

Screen went blank and machine locked up, so I had to remote ssh in and generate the log:

$ nvidia-bug-report.sh

The produced file is attached.
nvidia-bug-report.log.gz (393.7 KB)

I should note this machine is UEFI with secure boot enabled. The nvidia MOK is enrolled.

This was with stream 520:
$ sudo dnf module install nvidia-driver:latest

kmod-nvidia-520.61.05-4.18.0-372.26.1        x86_64        3:520.61.05-3.el8_6         @cuda-rhel8-x86_64         67 M
nvidia-driver                                x86_64        3:520.61.05-1.el8           @cuda-rhel8-x86_64         64 M

The issue doesn’t appear to occur with stream 515:

$ sudo dnf remove nvidia-driver -y
$ sudo dnf module  reset nvidia-driver
$ dnf module install nvidia-driver:515

Attaching the nvidia-bug-report.log.gz with a successful boot of stream 515.
nvidia-bug-report.log.gz (262.1 KB)

Going to try to install Windows on the device and see if the video card has a firmware update available.

The driver versions 520.61.05 and 515.76 have a bug. Please check if either 515.65.01 or 520.56 are still available or downgrade to 470 meanwhile.

Forgot: another workaround is using Displayport instead of HDMI.

Thanks for the response, 515.65.01 worked as previously noted.

However, we were using a displayport connection with 520.51.05 and the new release 520.56.06 notes that, “Fixed a regression in 515.76 that caused blank screens and hangs when starting an X server on RTX 30 series GPUs in some configurations where the boot display is connected via HDMI.

We’ll try to test 520.56.06 to verify it corrects the exhibited behavior.
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/

i also have problems with displayport

when boot the machine, cloldboot or reset, the screen text when boot systemd turns freeze, but the system still booting in background. after a minute (for make sure the machine is booted), binded typping my login/password, and then launch “startwayland” (my wayland session launcher script), the display recover and my desktop appears

the same thing when halt/poweroff. after logount the session, the display become black, do things, and then shutdown

this happen with kernel 6.0.2-arch1-1. and 520.61.05 drivers. the bootlog/journalctrl is out of nvidia driver errors

greetings

No obvious display issue when using 520.61.06 from runfile installation. However, I did see some errors in dmesg despite the X GUI environment displaying.

[   42.733575] No UUID available providing old NGUID
[   42.738806] No UUID available providing old NGUID
[   42.905101] r8169 0000:03:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
[   89.554568] ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210604/nsarguments-68)
[   89.755408] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.755425] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.755434] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.755522] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.755534] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.755543] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.755629] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.755641] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.755649] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.755735] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.755754] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.755769] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.755925] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.755938] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.755946] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756032] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756044] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756052] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756138] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756150] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756158] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756244] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756256] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756264] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756349] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756361] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756370] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756455] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756467] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756475] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756561] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756573] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756581] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756667] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756678] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756687] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.756800] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.756826] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.756845] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.757005] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.757028] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.757037] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)
[   89.757122] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210604/dsfield-185)
[   89.757134] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210604/dswload2-478)
[   89.757143] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210604/psparse-531)

Nvidia bug report attached:
nvidia-bug-report.log.gz (311.1 KB)