I have three cards with cuda on my ubuntu 14.04. The three cards run well most of the time.
But sometimes, the nvidia-smi loss one card, showing only two.
This happens many times on the history, everytime restart the ubuntu will solve the problem.
Everyone of the three has met this problem os that it is no relation with slot or power connector.
I want to know how to solve the problem forever because restart always interrupt my work.
lspci show three cards:
lspci | grep NVIDIA
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
03:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
05:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
06:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
but nvidia-smi only show two:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 378.13 Driver Version: 378.13 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:03:00.0 Off | N/A |
| 45% 34C P8 7W / 160W | 169MiB / 8114MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 1070 Off | 0000:06:00.0 Off | N/A |
| 0% 35C P0 33W / 160W | 0MiB / 8114MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3627 C python 81MiB |
±----------------------------------------------------------------------------+