RTX 2070 - NVIDIA 510.60.02 - Freezes (GPU has fallen off the Bus)

nvidia-bug-report.log.gz (247.4 KB)
PC Freezes Randomly
Especially when running games.
Entry: GPU Has fallen off the Bus.

Mär 28 13:30:51 seb kernel: NVRM: GPU at PCI:0000:01:00: GPU-17becba7-9540-63e6-c6a2-6f931a9958e3
Mär 28 13:30:51 seb kernel: NVRM: Xid (PCI:0000:01:00): 79, pid=0, GPU has fallen off the bus.
Mär 28 13:30:51 seb kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Mär 28 13:30:51 seb kernel: NVRM: GPU 0000:01:00.0: GPU serial number is .
Mär 28 13:30:51 seb kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.

Most common causes on desktop systems are overheating and lack of power.
Your gpu is quite hot (62°C) while idle, please check fans and airflow, monitor temperatures.
To rule out power issues on boost, you can try limiting clocks using nvidia-smi -lgc

So I installed lm_sensors and started the process:

Package id 0: +54.0°C (high = +82.0°C, crit = +100.0°C)
Core 0: +52.0°C (high = +82.0°C, crit = +100.0°C)
Core 1: +50.0°C (high = +82.0°C, crit = +100.0°C)
Core 2: +50.0°C (high = +82.0°C, crit = +100.0°C)
Core 3: +53.0°C (high = +82.0°C, crit = +100.0°C)
Core 4: +54.0°C (high = +82.0°C, crit = +100.0°C)
Core 5: +52.0°C (high = +82.0°C, crit = +100.0°C)

Looks really bad
But after cleaning out some dust and the ventilators hatches
I got this, now nothing is over 50 degrees

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +45.0°C (high = +82.0°C, crit = +100.0°C)
Core 0: +44.0°C (high = +82.0°C, crit = +100.0°C)
Core 1: +43.0°C (high = +82.0°C, crit = +100.0°C)
Core 2: +42.0°C (high = +82.0°C, crit = +100.0°C)
Core 3: +42.0°C (high = +82.0°C, crit = +100.0°C)
Core 4: +45.0°C (high = +82.0°C, crit = +100.0°C)
Core 5: +43.0°C (high = +82.0°C, crit = +100.0°C)

I still have to try out if it’s freezing or not

gpu temperatures, not cpu. use nvidia-smi or nvidia-settings to monitor.

I tested it with nvidia-smi and the normal Temperature is:
28% Fan → 48°C
But Ingame, i tested it with Yooka-Laylee
The Nvidia increased between
52% Fan → 79-81°C

I check if I can dust off some dirt and then put it in place again,
something feels off.

Those temperatures are fine. Please check for insufficient power by limiting clocks.

I tested the Card with Lowest 1, 1870 and the Maximum 2304, did not freeze.
and temperature was the same.

sudo nvidia-smi -lgc 1
sudo nvidia-smi -lgc 1870
sudo nvidia-smi -lgc 2304

Heat stayed the same so no charging problems.