[SOLVED] XID 62: fixeable?

started since 2 days

the GPU gets gliches when run any graphic process, inclusive nvidia-smi or nvidia bugreport

works if i do “nothing” (in console, ofcourse) can use my linux throught console, cant see the text cleary,

but when run nvidia-smi or nvidia-bugreport turn to:

[   10.832686] nvidia: module license 'NVIDIA' taints kernel.
[   10.832688] Disabling lock debugging due to kernel taint
[   10.857575] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
[   10.858021] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[   10.858024] nvidia 0000:13:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   10.858207] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[   10.858208] NVRM: This can occur when a driver such as: 
[   10.858209] NVRM: Try unloading the conflicting kernel module (and/or
[   10.858211] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  387.12  Thu Sep 28 20:18:48 PDT 2017 (using threaded interrupts)
[   10.868786] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  387.12  Thu Sep 28 19:30:23 PDT 2017
[   10.878731] [drm] [nvidia-drm] [GPU ID 0x00001300] Loading driver
[   10.878733] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:13:00.0 on minor 0
[   38.219551] NVRM: Your system is not currently configured to drive a VGA console
[   38.243054] NVRM: GPU at PCI:0000:13:00: GPU-3f2b2eb6-9fcb-9fd8-00ca-9f2e0d9ada36
[   38.243056] NVRM: GPU Board Serial Number: 0321614030747
[   38.243059] NVRM: Xid (PCI:0000:13:00): 62, 1973(180c) 85000123 ffffff30
[  309.603535] NVRM: GPU at PCI:0000:13:00: GPU-3f2b2eb6-9fcb-9fd8-00ca-9f2e0d9ada36
[  309.603537] NVRM: GPU Board Serial Number: 0321614030747
[  309.603539] NVRM: Xid (PCI:0000:13:00): 62, 0bf9(1810) 00000000 00000000

i can back only if reboot

tested with kernel 4.13.7 and 4.13.10 and nvidia 387.12 and 387.22

Nvidia Titan Black (host) <- (the “broken” unit (?))
Nvidia Titan X maxwell (guest, VM, disabled by kvm)
mobo EVGA SR-2
2x xeon X5650
48Gb ram
PSU EVGA SuprNOVA 1000P (1kw)
nvidia-bug-report.log.gz (222 KB)

any thought?

Whenever I saw this video output the past decades, my vga card was broken. Looking at the logs, the errors are inconsistent, you were mentioning XID 62+crash and the logs also show XID 38+crash and no XID+crash. I’d say, check in another computer and prepare for replacement.


Reballing the unit solve the problem!

only 60 bucks!