Pop!_OS 22.04 LTS x86_64 NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver 550

Hi, I’m not a noob on linux but with nvidia on linux yes.
I have just finished building my PC and once installed Pop!_OS 22.04 the version with nvidia drivers no luck when I tried to plug the hdmi on the video card and tried to boot, it simply gave me a grey screen and I suspect a freeze of the OS.

In my setup there is:
CPU: AMD Ryzen 7 7800X3D
GPU: RTX 4070 super

By using the hdmi of the mobo I have tried a lot of removing and reinstalling following similar problems in other threads but with no luck.

I have disabled secure-boot.

Running nvidia-smi I have this output:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

and trying to run nvidia-settings I have this output:
ERROR: NVIDIA driver is not loaded (nvidia-settings:100641): GLib-GObject-CRITICAL **: 22:04:40.924: g_object_unref: assertion 'G_IS_OBJECT (object)' failed ** (nvidia-settings:100641): CRITICAL **: 22:04:40.924: ctk_powermode_new: assertion '(ctrl_target != NULL) && (ctrl_target->h != NULL)' failed ERROR: nvidia-settings could not find the registry key file or the X server is not accessible. This file should have been installed along with this driver at /usr/share/nvidia/nvidia-application-profiles-key-documentation. The application profiles will continue to work, but values cannot be prepopulated or validated, and will not be listed in the help text. Please see the README for possible values and descriptions. ** (nvidia-settings:100641): WARNING **: 22:04:40.943: PRIME: Failed to execute child process “/usr/bin/prime-supported” (No such file or directory) ** Message: 22:04:40.943: PRIME: is it supported? no

Sorry if the post is not perfect, its my first time here.

Some help is really appreciated.

nvidia-bug-report.log.gz (181.2 KB)

Log is flooded with

Mar 31 20:26:17 pop-os kernel: [  901.308171] NVRM: (PCI ID: 10de:2783) installed in this system has
Mar 31 20:26:17 pop-os kernel: [  901.308171] NVRM: fallen off the bus and is not responding to commands.

Please create a new log right after boot.

Do you have any suggestion on how to fix it? I have updated the file logs

It’s a hardware issue, the gpu is reporting pcie errors on all lanes:
LaneErrStat: LaneErr at lane: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I guess it’s just incorrectly seated in its slot. Please reseat it, maybe even multiple times to clean dirt off the slot pins.

Also, please remove any overclocking settings in bios, if any.

Hi, little update, you suggestion was right but it was not the physical connection, now the drivers is working but i had to change the PCIe bifurcation from auto to x8x8. Everything works fine but I don’t know if this is the right way of solving the issue neither if I’m cutting down on performance.

Do you have any suggestion or I can leave my set up like this? I’m afraid something its wrong

Please create a new nvidia-bug-report.log

Here it is, just made a reboot. Thank you for the patience

nvidia-bug-report.log.gz (496.8 KB)

Like expected, the gpu is running at x8 now. Did you also try setting x16 instead of “auto” in bios? I’d suspect there’s something wrong with the mainboard, still under warranty?

There is no x16 to be selected, just auto and the various bifurcations. Yeah, its still under warranty should i try get another one?

Yes, you should contact gigabyte to get a correctly working system.

I hope its that the problem and not the GPU. I guess I don’t have a lot of choice, thank you so much for the help! I will give updates