I have a 4 Quadro 8000 machine. When I start running a process on the gpu_0 Quadro 8000. It crashes:
- The fan of gpu_0 runs at max speed.
- ‘nvidia-smi’ shows ‘Unable to determine the device handle for GPU 0000:1A:00.0: Unknown Error’.
- The process on GPU cannot be stopped by ‘sudo kill -9’.
- The fan doesn’t stop, even after I shut down the machine. Only cutting off the power can stop the fan.
- The other three GPUs are normal with the same process. Their fans keep a normal speed during crash of gpu_0.
- I swap the gpu_0 to another slot, still the same problem.
Attach my bug report:
nvidia-bug-report.log.gz (954.0 KB)