Hello guys,
I have a problem with T4 GPU, I installed them on different servers and had several problems
On some, they weren’t detected at all, no matter I have driver and cuda installed or not
Sometimes they were detected but after a reboot, they weren’t anymore, they would appear at some reboot
Servers would crash for no reason (no temperature problem or anything) after around 20min
Here is what I have the few times it works:
uname -a on ubuntu 20.04:
Linux *** 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
uname -a on ubuntu 18.04:
Linux *** 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
lspci :
01:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
nvidia-smi :
| NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:01:00.0 Off | 0 |
| N/A 82C P0 69W / 70W | 13620MiB / 15109MiB | 100% Default |
| | | N/A |
Is there someone who had similar problems with T4 GPU with ubuntu ?
Do you know what I could do to solve it ?
With ubuntu 18.04:
nvidia-bug-report.log.gz (56.4 KB)
(GPU not detected)
With ubuntu 20.04:
ubuntu2004-nvidia-bug-report.log.gz (401.7 KB)
(GPU has been detected)