Fan is 0% and errors during boot

Using Ubuntu and one of our A6000 has a fan error. NVidia-SMI shows fan speed 0% and when booting linux we get a loop of errors.


This happens during boot up and we need to hard reboot multiple times

Linux shows unknown PCI header type

Using driver 535.113.01

I suspect the fan might be broken, the gpu then overheats and shuts down. (Unknown PCI header type ‘127’ (ff)) points to it.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

1 Like

This is rather embaressing but… the GPU cable underneath the fan was preventing it from spinning. The workstation is made so that there is no way to visually notice this unless you stick your finger under the card.

Problem solved and stress (fear) levels back to baseline.