I installed a new machine with a A100-40G. When the computer first starts up, the A100 operates normally. But after running for a while, the GPU starts can not use.
The result of the nvidia-smi command is “Unable to determine the device handle for GPU0000:82:00.0: Unknown Error”.
nvidia-bug-report.log.gz (201.3 KB)