Unable to determine the device handle for GPU when running multi-GPU experiments

Hello, we have a server running 4 GTX 1080 Ti. Occasionally (when experiments are running on all 4 GPUs) we get the error of the title and need to reboot the server. I attach the bug report.

nvidia-bug-report.log.gz (2.7 MB)