We have installed two Tesla K80 GPUs for computing, but they are always out of work after we login into Ubuntu for a couple of minutes. The states of both K80 are OK in the beginning, but the error state would soon appear (in “nvidia-smi” command). And we can see that the temperatures of K80 are extremely high before they go out of work. The motherboard we are using is ASUS X99-E WS. Can anyone tell us where the problem is? Is the motherboard incompatible or the cooling system far from satisfactory (currently two fans above them)?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| CentOs 7.5 Tesla K80 Crash when running any GPU job | 4 | 803 | December 15, 2023 | |
| Tesla K80 Initital Setup Problem | 4 | 8732 | February 18, 2021 | |
| Driver Installation for Tesla K80 - Problems | 17 | 7192 | January 18, 2020 | |
| Tesla K80 overheating | 5 | 7042 | February 12, 2017 | |
| GPU 0 Overheating if >1 Tesla K80 Installed | 2 | 1984 | May 27, 2021 | |
| Tesla K80 with standard mother board for dual GPUs with Titan X | 0 | 1017 | June 7, 2016 | |
| Driver Installing Problem for NVIDIA Tesla K80 under Linux | 10 | 21122 | August 16, 2015 | |
| cannot install driver correctly for tesla k80 | 3 | 2788 | August 31, 2020 | |
| Tesla K80 detected on OpenSuse 15.5, but nvidia-smi couldn't communicate with the NVIDIA driver | 8 | 1560 | June 18, 2023 | |
| Plugging Tesla K80 results in PCI resource allocation error | 14 | 23295 | March 16, 2021 |