I have a Linux system with 4 GTX 1070s, all the exact same manufacturer/model. Everything works perfectly when SLI is completely disabled. When I enable SLI I get the following nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.42 Driver Version: 390.42 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:02:00.0 On | N/A |
| 0% 46C P8 9W / 180W | 47MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1070 Off | 00000000:03:00.0 Off | N/A |
| 0% 47C P8 9W / 180W | 47MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 1070 Off | 00000000:87:00.0 Off | N/A |
| 0% 37C P8 13W / 180W | 2MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 1070 Off | 00000000:88:00.0 Off | N/A |
| 0% 46C P8 9W / 180W | 2MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2342 G /usr/lib/xorg/Xorg 14MiB |
| 1 2342 G /usr/lib/xorg/Xorg 14MiB |
+-----------------------------------------------------------------------------+
Only when the X server with SLI is running, the first two GPUs show a total memory of 8191MB, otherwise all four GPUs show 8119MB, but that’s not the problem (or at least I think so).
If I enable SLI on the first 2 GPUs (indices 0,1), I can still use CUDA, but only on the pair with SLI or the pair without SLI, which I make visible to the CUDA application by setting the environment variable CUDA_VISIBLE_DEVICES=“0,1” or CUDA_VISIBLE_DEVICES=“2,3”. Even if I make all 4 GPUs visible to the CUDA application (confirmed via cudaGetDeviceCount()) but only use one of the pairs, everything works fine.
Now when I attempt to make all 4 GPUs visible, the application always crashes with cudaErrorDevicesUnavailable when trying to do anything on the second pair of GPUs, e.g., creating one stream per device on a loop will fail when it the stream creation reaches the first GPU of the second pair, no matter which pair is visible first. I tried first with my own application, but I can confirm the same issue with sample applications 0_Simple/simpleMultiGPU and 4_Finance/MonteCarloMultiGPU from the CUDA toolkit.
When I run two separate processes each addressing one pair, everything works too, so I can narrow it down to a single process trying to access both pairs. My application is very complex, and I have both CUDA code for computing and CUDA-GL interoperability for rendering, so decoupling compute and rendering part is not really an option.
I can’t seem to find documentation anywhere whether this is an expected behavior or not. Could someone point out if I missed something?
Thanks,
Peter