Hello,
our GPU-node runs Ubuntu 20.04 with 8x NVIDIA A100 GPUs. After a restart, it seems that one of the GPUs can not be found anymore.
Any ideas how to resolve this?
Nvidia bug report:
nvidia-bug-report.log.gz (2.9 MB)
Thanks!
Hello,
our GPU-node runs Ubuntu 20.04 with 8x NVIDIA A100 GPUs. After a restart, it seems that one of the GPUs can not be found anymore.
Any ideas how to resolve this?
Nvidia bug report:
nvidia-bug-report.log.gz (2.9 MB)
Thanks!
Topic | Replies | Views | Activity | |
---|---|---|---|---|
1 out of 4 GPUs is suddenly gone | 0 | 353 | November 17, 2020 | |
GPU devices not found after kernel update Ubuntu 20.04 | 6 | 1252 | August 6, 2022 | |
Unable to determine the device handle for GPU. GPU is lost. Reboot the system to recover this GPU | 0 | 519 | November 11, 2020 | |
Lost one GPU in nvidia-smi | 1 | 140 | July 8, 2024 | |
GPU is lost. Reboot the system to recover this GPU | 1 | 4024 | October 1, 2019 | |
NVIDIA graphics card not found anymore under Ubuntu 20.04 | 2 | 2862 | November 28, 2020 | |
One GPU dissapears when idle | 0 | 421 | July 3, 2020 | |
GPU is lost ramdomly and nvidia-smi asks for a reboot to recover it | 3 | 2891 | October 1, 2021 | |
NVidia GPU disappears after reboot | 3 | 11592 | November 24, 2015 | |
GPU randomly lost | 2 | 635 | July 27, 2023 |