/dev/sda5: clean, files …, blocks… AND PCIe Bus Error: severity=Corrected after nvidia-driver install

I have Dell laptop with Nvidia Quadro 620M with Ubuntu 20.04.2 (along some remnants of Windows) with kernel 5.8.0-48. After installing nvidia drivers the system won’t boot - it does not reach the login screen - just displays the 2 messages from the heading. Tried adding “pci=noaer” to GRUB, but it does get rid only of PCIe Bus Error, and the system still won’t boot. I tried purging nvidia drivers with “sudo apt purge nvidia” and then system boots but I tried reinstalling nvidia drivers in any way I found on the and the problem was always coming back. Please help, I need nvidia working with cuda for some computation!

Ubuntu 20 currently has bugs with their kernel/nvidia packages. Please try updating Ubuntu and then install the driver from the graphics driver ppa.

1 Like

Please also make sure no xorg.conf is created (/etc/X11/xorg.conf), otherwise delete it.

1 Like

generix, thank you for replying. I updated/upgraded ubuntu with sudo apt update, sudo apt upgrade, and installed drivers again through graphical interface “Software & Updates >> Additional Drivers”, using driver-460 and driver-450, both without much success. I’m attaching bug report, hope this will be of some help. I’d be very grateful for further assistance. nvidia-bug-report.log.gz (444.5 KB)

You’re getting XID 32 errors from the driver and xorg is crashing with a gpu exception. This is most likely a hardware defect, the XID 32 points to system memory. Please check/remove memory modules.

1 Like

thanks for your reply, I assume you mean RAM modules, not GPU memory? Can I check them somehow without physically removing them, like with memtester/memtest86+? I’d be very grateful for further assistance.

Yes, system RAM. You can try memtest86 but I’m not toofond of it since it often can’t detect subtle memory defects.

1 Like

Thank you for your reply. I removed each of two RAM modules, replace each of them to the other slop, swapped them, but nothing helped. I’d be grateful for any other suggestion.

Not good. Please try booting without pci=noaer and create a new nvidia-bug-report.log, maybe this will shed some more light on what’s broken.

1 Like

thanks again for assistance. I’m attaching new log without pci=noaer: nvidia-bug-report.log.gz (444.5 KB)

That log was still using noaer, wrong file attached?

I apologize, my mistake, here is the correct one: nvidia-bug-report.log.gz (409.9 KB)

The aer errors come from the ethernet adapter but only sporadic ones, nothing that would keep the nvidia gpu from working.
I guess you’ll have to use gpu-burn or cuda-gpumemtest to check the gpu for damage.
Does it properly work in Windows, e.g. running some Uningine demo on it?

I unfortunately have no working windows partition on this laptop, but I’m working on it. I created additional ubuntu 18.04 partition and the symptoms were similar. After compiling I run “gpu_burn -3600” command on ubuntu 20.04 partition and the output was as on attached images, for more than and hour, seems like it got stuck. I’d be grateful for further assistance.

The gpu is crashing instantly, it’s broken.

1 Like

generix, thank you very much for your assistance, I will arrange the swap, laptop should still have valid manufacturers guarantee.