FAN and PWR ERR ERR during dl training

Hi! When I used pytorch doing some dl training, sometimes one of gpu would went wrong. I am not very familiar with hardware, Please help!


nvidia-bug-report.log (2.98 MB)