I set up a new Ubuntu Desktop 20.04 LTS. I installed the 470 driver via Ubuntus Addiotional driver tool. That worked! And from then on, nvidia-smi showed correct output.
However, I needed ti run CUDA… I need 11.2. I decided to use the standalone .run file for that.
The installer told me, that a system driver was detected and it is strongly recommended to remove that first. Well: I did.
The installer started and gave an driver installation error. Some build went wrong, dont know…
I then read, that CUDA 11.2 has only a minimum driver requirement and I wanted to use it with the previous installed one. So I removed the installer (nothing yet installed) and reinstalled the 470 via Ubuntus tool. That worked without any error but since then:
# nvidia-smi
No devices were found
And this is the current state. I purged EVERYTHING nvidia related several times. All kernel modules and any nvidia traced were removed. I reinstalled them hundred times - nothing. Switched to 495 - nothing.
Iam not able to make the driver work correctly anymore. The CUDA test installation detected the driver but was not able to open the card.
[ 31.030373] NVRM: GPU 0000:04:00.0: Failed to copy vbios to system memory.
not looking good, seems the vbios got corrupt. Please power down the system, detach from power, let it sit unpowered for 30 minutes, then try again. Is the 3080 still under warranty?
Cosmic rays? Failing flash-rom? Shouldn’t happen but sometimes does, but mostly on older cards. Can often be fixed by re-flashing the same vbios image but this is mostly hard to find and not recommended while still under warranty. Which vendor/brand is the gpu?
We took out the 3080 and inserted it into another machine with windows. The card worked well in that system. Driver installation without issues. There is no problem noticeable. Is windows behaving in another way here? Does windows not need to copy the vbios?
On rare occasions, this also happens due to kernel/driver bugs but there’s currently nothing known to me. Did you already try to re-insert it into the linux system?
Next update: We tested it again today and now the 3080 stopped working completely. No picture at all within the windows test system. So we try to RMA it now.