I’m having trouble with an RTX A6000 installed as a single GPU on a SuperMicro H12DSG-O-CPU mainboard, running Ubuntu Server 21.04 with NVIDIA drivers version 470.63.01. The GPU is recognized by the system at boot time, but shortly thereafter I get an Xid 62 error in
dmesg and soon after that I get other Xid errors, usually ending with Xid 79. Initially suspecting a power issue, I’ve tried connecting the GPU to different mainboard power outputs but the results don’t change much – some details might be different, but it’s unclear to me if there’s any real influence.
I’ve attached some outputs from
nvidia-bug-report.sh; the “1a” and “1b” reports are from before and after the Xid 79 appears in
dmesg, with the only command issued by be prior to Xid 79 appearing being `sudo nvidia-bug-report.sh.
I’m getting another 8-pin-to-8-pin power cable to see if the cable itself is an issue, but in the meantime, if these log files provide any other insights as to what might be amiss, I’ll be grateful for any help any of y’all can offer!