My rig works exactly once. I keep having to configure it over and over

TL:DR I’m not sure if this is an Ubuntu problem or an Nvidia problem. My configuration lasts exactly once and doesn’t stay after a reboot.

I see release dates are jiving with the testing I’m doing so I’d hope to contribute something along the way. Part of my difficulty is chasing these behaviors with “black box” testing. I’ve never debugged drivers, so this part is new. The last several weeks have seen several updates to this range of drivers and I’ve tried them all with the same results.

Ubuntu 22.04 LTS
Nvidia 3070 card (8GB)
510-545 drivers all do the following behaviors.

v545 finally worked but this underlying issue has been a problem for weeks. Whenever I get a stable installation, it lasts only once. Why?

Odd details:

When it crashes, I’m seeing what looks like artifacts on the bootup log screen. It looks to be trying to put up a rest API bubble or two before the graphics are done loading. When this happens, bootup gets stuck in a loop trying to start endless sessions. Maybe something was left in there during testing or it’s on my end? I don’t know.

When I’m not getting the above bubble artifacts, I get a black screen or green bars and dots and have to start over.

I’ve tried both ubuntu installs and nvidia drivers and the cuda toolkit. I know how to clean up the libs and lacklist nouveau. This last round of testing was the first time 545 and 12.3 CUDA has worked for me. I had to mix the ubuntu autoinstall with the Nvidia CUDA install to make that work.

If I could get this to just stick, I’d be plenty happy and can move on studying AI.

Hi there @gmtrs and welcome to the NVIDIA developer forums.

What are “rest API bubbles”? Did you consider possible Hardware issues?

In general the recommendation is to NOT mix installation methods. Ever.
Clean install Ubuntu.
Install CUDA which will include(!) the correct driver, following instructions to the letter.

If that already causes a crash and reboot loop, it is very likely you have a Hardware issue. Check for GPU seating in the PCIe slot, check the CPU, temperatures, fans, etc.

The combination of Ubuntu and GPU is something that a lot of people are using successfully, also with CUDA, meaning there is nothing “left in there during testing”.