Quadro 8000 crash with process blocking

I have a 4 Quadro 8000 machine. When I start running a process on the gpu_0 Quadro 8000. It crashes:

  1. The fan of gpu_0 runs at max speed.
  2. ‘nvidia-smi’ shows ‘Unable to determine the device handle for GPU 0000:1A:00.0: Unknown Error’.
  3. The process on GPU cannot be stopped by ‘sudo kill -9’.
  4. The fan doesn’t stop, even after I shut down the machine. Only cutting off the power can stop the fan.
  5. The other three GPUs are normal with the same process. Their fans keep a normal speed during crash of gpu_0.
  6. I swap the gpu_0 to another slot, still the same problem.

Attach my bug report:
nvidia-bug-report.log.gz (954.0 KB)