Using Ubuntu and one of our A6000 has a fan error. NVidia-SMI shows fan speed 0% and when booting linux we get a loop of errors.
This happens during boot up and we need to hard reboot multiple times
Linux shows unknown PCI header type
Using driver 535.113.01
I suspect the fan might be broken, the gpu then overheats and shuts down. (Unknown PCI header type ‘127’ (ff)) points to it.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
This is rather embaressing but… the GPU cable underneath the fan was preventing it from spinning. The workstation is made so that there is no way to visually notice this unless you stick your finger under the card.
Problem solved and stress (fear) levels back to baseline.