Hey!
I recently noticed that the ASUS PRIME mainboards ship with PCI ASPM (Active State Power Management) disabled by default.
I turned on PCI ASPM in my BIOS settings (setting value: L0sL1) and indeed my PC uses less power.
However, sometimes the picture freezes and I have to reboot my machine. I get the following error messages in syslog:
Jun 14 20:49:15 midna kernel: pcieport 0000:00:01.0: AER: Multiple Corrected error received: 0000:00:01.0
Jun 14 20:49:15 midna kernel: pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Jun 14 20:49:15 midna kernel: pcieport 0000:00:01.0: device [8086:460d] error status/mask=00001000/00002000
Jun 14 20:49:15 midna kernel: pcieport 0000:00:01.0: [12] Timeout
Jun 14 20:49:15 midna kernel: pcieport 0000:00:01.0: AER: Error of this Agent is reported first
Jun 14 20:49:15 midna kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Jun 14 20:49:15 midna kernel: nvidia 0000:01:00.0: device [10de:2486] error status/mask=00001000/0000a000
Jun 14 20:49:15 midna kernel: nvidia 0000:01:00.0: [12] Timeout
Jun 14 20:49:15 midna kernel: snd_hda_intel 0000:01:00.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Jun 14 20:49:15 midna kernel: snd_hda_intel 0000:01:00.1: device [10de:228b] error status/mask=00001000/0000a000
Jun 14 20:49:15 midna kernel: snd_hda_intel 0000:01:00.1: [12] Timeout
Jun 14 20:52:08 midna kernel: pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:01.0
Jun 14 20:52:08 midna kernel: pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Jun 14 20:52:08 midna kernel: pcieport 0000:00:01.0: device [8086:460d] error status/mask=00100000/00010000
Jun 14 20:52:08 midna kernel: pcieport 0000:00:01.0: [20] UnsupReq (First)
Jun 14 20:52:08 midna kernel: pcieport 0000:00:01.0: AER: TLP Header: 34000000 01000010 00000000 00000000
Jun 14 20:52:08 midna kernel: nvidia 0000:01:00.0: AER: can't recover (no error_detected callback)
Jun 14 20:52:08 midna kernel: snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback)
Jun 14 20:52:08 midna kernel: pcieport 0000:00:01.0: AER: device recovery failed
Jun 14 20:52:23 midna kernel: NVRM: GPU at PCI:0000:01:00: GPU-13311476-4aa3-3cdf-28f1-5ffe801de085
Jun 14 20:52:23 midna kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jun 14 20:52:23 midna kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
I searched the forums for the error message, but all I could find were reports where the power supply or cabling turned out to be the issue.
I have two different machines that use the same mainboard model (PRIME Z690-A) but are otherwise different: different PSU, different CPU, different GPU (“Gigabyte GeForce RTX 3060 Ti Vision OC” in one, “MSI GeForce RTX 3060 Ti GAMING X TRIO” in the other).
I observe the issue in both machines once I turn on PCI ASPM.
Could you take a look and see if there are any known issues with power saving with nVidia cards on Linux? Any ideas what I could try?
Thanks