Things are a little complicated…
I just got a new machine, everything is fine after install Ubuntu 19.04 ( Nvidia driver automatically installed by selecting install third-party drivers when install the Ubuntu, and nvidia-smi works).
However, only after running several epochs of deep learning programs (Pytorch), the system hang and I have to reboot, but since then the system never come back to normal again.
I finally reinstalled the system by the same way mentioned before, but the GPU RTX 2080ti never come back!
By the way, I can install the system by setting BIOS default display to GPU at the first time, but at the second time, the display only shows random color strips, thus I have to change BIOS default display to Intel Integrated Graphic Card.
Is this a GPU hard-ware issue?
Note: I have also tried Ubuntu 18.04 (but manually install the nvidia driver), the same issues remain.
The following is the information on a clean machine (new installed Ubuntu 19.04, nothing is changed):
To login and collect information below, I inserted an additional GTX 960.
jeff@jeff:~$ nvidia-smi
Sat Aug 10 16:35:25 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960 Off | 00000000:01:00.0 Off | N/A |
| 0% 53C P8 12W / 160W | 209MiB / 2002MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1665 G /usr/lib/xorg/Xorg 100MiB |
| 0 1919 G /usr/bin/gnome-shell 105MiB |
+-----------------------------------------------------------------------------+
jeff@jeff:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
03:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
03:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Controller (rev a1)
03:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 UCSI Controller (rev a1)
Other phenomenon:
nvidia-smi is very slow.
nvidia-bug-report.log.gz (759 KB)