I have a 780Ti on an EVGA X99 board, 2 LG monitors attached to a single card.
I am getting random hard-locks that are more frequent when doing graphics intensive things, like playing with compositing or gaming.
The locks never occur when I am on a TTY - only when I’m on X.
The system is not overclocked, and it is mprime and memtest stable as far as I can tell (over an hour in each.)
I get things like this in my logs if I do not boot with pci=nommconf in my kernel parameters:
[ 3.986470] nvidia 0000:02:00.0: irq 79 for MSI/MSI-X
[ 3.989374] pcieport 0000:00:02.0: AER: Corrected error received: id=0010
[ 3.989381] pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Transmitter ID)
[ 3.989383] pcieport 0000:00:02.0: device [8086:2f04] error status/mask=00001000/00002000
[ 3.989385] pcieport 0000:00:02.0:  Replay Timer Timeout
[ 4.141789] pcieport 0000:00:02.0: AER: Corrected error received: id=0010
[ 4.141795] pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Receiver ID)
[ 4.141797] pcieport 0000:00:02.0: device [8086:2f04] error status/mask=00000040/00002000
[ 4.141798] pcieport 0000:00:02.0: [ 6] Bad TLP
pci=nommconf makes the errors disappear, but the lockups continue.
Anyone have any ideas on what to check next?
nvidia-bug-report.log.gz (193 KB)