Unable to determine the device handle for gpu

GPU: Tesla K80
OS: Ubuntu 20.04.4 LTS 64bit (It’s not a virtual machine.)
nvidia-bug-report.sh log
nvidia-bug-report.log.gz (562.9 KB)

[English]
While running a program that uses CUDA, I suddenly get an error and the GPU is lost.

When I check with nvidia-smi command, I get “unable to determine the device handle for gpu 0000:03:00.0:”.
After reboot, it is recognized again.

Is there any one knows why the gpu is lost?

translate by DeepL.

=====
Japanese

CUDAを使用するプログラムを実行していると,急にエラーが出てGPUが失われます.

nvidia-smi コマンドで確認すると 「unable to determine the device handle for gpu 0000:03:00.0:」 と表示されます.
再起動すると再び認識します.

ロストする原因がわかる方はいますか?

Looks like you’re running the K80 in a desktop enclosure. It needs additional fans.