No RTX 2080 TI detected by nvidia-smi in both Ubuntu 19.04 and 18.04. Boot shows random color points.

Things are a little complicated…

I just got a new machine, everything is fine after install Ubuntu 19.04 ( Nvidia driver automatically installed by selecting install third-party drivers when install the Ubuntu, and nvidia-smi works).

However, only after running several epochs of deep learning programs (Pytorch), the system hang and I have to reboot, but since then the system never come back to normal again.

I finally reinstalled the system by the same way mentioned before, but the GPU RTX 2080ti never come back!

By the way, I can install the system by setting BIOS default display to GPU at the first time, but at the second time, the display only shows random color strips, thus I have to change BIOS default display to Intel Integrated Graphic Card.

Is this a GPU hard-ware issue?

Note: I have also tried Ubuntu 18.04 (but manually install the nvidia driver), the same issues remain.

The following is the information on a clean machine (new installed Ubuntu 19.04, nothing is changed):
To login and collect information below, I inserted an additional GTX 960.

jeff@jeff:~$ nvidia-smi
Sat Aug 10 16:35:25 2019       
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 960     Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   53C    P8    12W / 160W |    209MiB /  2002MiB |      0%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      1665      G   /usr/lib/xorg/Xorg                           100MiB |
|    0      1919      G   /usr/bin/gnome-shell                         105MiB |
jeff@jeff:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
03:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
03:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Controller (rev a1)
03:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 UCSI Controller (rev a1)

Other phenomenon:

nvidia-smi is very slow.

nvidia-bug-report.log.gz (759 KB)

Random color points/stripes point to defective hardware (video memory). You could use gpu-burn for 10 minutes to confirm, then RMA.