I am using Ubuntu 16.04+nvidia-375 driver with three GPUs (Compute Mode=Default):
- GTX Titan X
- Quadro M6000
- GTX 580
When I run example 0_Simple/simpleMultiGPU I encounter “code=46(cudaErrorDevicesUnavailable)”, that translates to all CUDA-capable devices are busy or unavailable.
The same error I encounter in my own code:
- sometimes only Titan X and M6000 works, while GTX 580 is busy
- sometimes (after application relaunch) only GTX 580 works well, while Titan X and M6000 are busy
Also these launches work:
- CUDA_VISIBLE_DEVICES=0 ./simpleMultiGPU
- CUDA_VISIBLE_DEVICES=1 ./simpleMultiGPU
- CUDA_VISIBLE_DEVICES=2 ./simpleMultiGPU
- CUDA_VISIBLE_DEVICES=0,2 ./simpleMultiGPU
But these don’t work:
- CUDA_VISIBLE_DEVICES=0,1 ./simpleMultiGPU
- CUDA_VISIBLE_DEVICES=1,2 ./simpleMultiGPU
- CUDA_VISIBLE_DEVICES=0,1,2 ./simpleMultiGPU
GTX 580 corresponds to digit “1”. So there are some incompatibility between GTX 580 and other two GPUs? How can it be fixed?
More interesting is that OpenCL version of my application works well on all three GPUs.
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:05:00.0 On | N/A |
| 22% 40C P8 16W / 250W | 329MiB / 12205MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro M6000 Off | 0000:09:00.0 Off | Off |
| 26% 43C P5 17W / 250W | 1MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 580 Off | 0000:0B:00.0 N/A | N/A |
| 41% 40C P0 N/A / N/A | 0MiB / 3004MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
A lot of users have the same problem for seven years (on this forum - https://devtalk.nvidia.com/search/more/sitecommentsearch/all%20CUDA-capable%20devices%20are%20busy%20or%20unavailable/, on stackoverflow and in other places). The problem is more strange because OpenCL works fine (in despite of working over CUDA backend).
What is the reason of such behaivour?