I often hear people say there is a 16-GPU limit for NVIDIA video cards, is it true? I searched google a bit but looks like no one has really made a test for it.
Now, after successfully building a rig with more than 16 GPUs by myself, I can tell you this rumor is not true, the so-called 16-GPU limit (for NVIDIA cards) doesn’t exist at all.
The rig I built is a GPU monster with 11 NVIDIA cards (4x GTX660 Ti, 5x GTX295, and 2x 9800 GX2). You see, I used some old cards to save money, and 7 of those are dual-GPU cards, so the total GPU number is 18.
The motherboard’s model is Supermicro X9DRX±F and it has 11 pci-e slots, but all of them are x8 slots. With similar method as FASTRA II used, some pci-e extenders are employed to make it possible to connect these cards onto the motherboard.
Here is some detailed system information for the 18-GPU monster:
root@server:~# dmesg | grep "DMI:"
[ 0.000000] DMI: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
root@server:~# lspci | grep NVIDIA | grep -v bridge | grep -v Audio
03:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
04:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
07:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0b:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0c:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0f:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
10:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
11:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
84:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
85:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
88:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
89:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8c:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8d:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8e:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
8f:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
root@server:~# nvidia-smi -pm 1
Persistence mode is already Enabled for GPU 0000:03:00.0.
Persistence mode is already Enabled for GPU 0000:04:00.0.
Persistence mode is already Enabled for GPU 0000:07:00.0.
Persistence mode is already Enabled for GPU 0000:08:00.0.
Persistence mode is already Enabled for GPU 0000:0B:00.0.
Persistence mode is already Enabled for GPU 0000:0C:00.0.
Persistence mode is already Enabled for GPU 0000:0F:00.0.
Persistence mode is already Enabled for GPU 0000:10:00.0.
Persistence mode is already Enabled for GPU 0000:11:00.0.
Persistence mode is already Enabled for GPU 0000:83:00.0.
Persistence mode is already Enabled for GPU 0000:84:00.0.
Persistence mode is already Enabled for GPU 0000:85:00.0.
Persistence mode is already Enabled for GPU 0000:88:00.0.
Persistence mode is already Enabled for GPU 0000:89:00.0.
Persistence mode is already Enabled for GPU 0000:8C:00.0.
Persistence mode is already Enabled for GPU 0000:8D:00.0.
Persistence mode is already Enabled for GPU 0000:8E:00.0.
Persistence mode is already Enabled for GPU 0000:8F:00.0.
All done.
root@server:~# nvidia-smi
Fri Nov 29 08:35:14 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.319.37 Driver Version: 319.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 9800 GX2 On | 0000:03:00.0 N/A | N/A |
| N/A 53C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce 9800 GX2 On | 0000:04:00.0 N/A | N/A |
| 80% 56C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 295 On | 0000:07:00.0 N/A | N/A |
| N/A 50C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 295 On | 0000:08:00.0 N/A | N/A |
| 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 295 On | 0000:0B:00.0 N/A | N/A |
| N/A 51C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 295 On | 0000:0C:00.0 N/A | N/A |
| 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce 9800 GX2 On | 0000:0F:00.0 N/A | N/A |
| N/A 55C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce 9800 GX2 On | 0000:10:00.0 N/A | N/A |
| 80% 53C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 8 GeForce GTX 660 Ti On | 0000:11:00.0 N/A | N/A |
| 30% 32C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 9 GeForce GTX 295 On | 0000:83:00.0 N/A | N/A |
| N/A 51C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 10 GeForce GTX 295 On | 0000:84:00.0 N/A | N/A |
| 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 11 GeForce GTX 660 Ti On | 0000:85:00.0 N/A | N/A |
| 30% 31C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 12 GeForce GTX 295 On | 0000:88:00.0 N/A | N/A |
| N/A 53C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 13 GeForce GTX 295 On | 0000:89:00.0 N/A | N/A |
| 41% 51C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 14 GeForce GTX 295 On | 0000:8C:00.0 N/A | N/A |
| N/A 53C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 15 GeForce GTX 295 On | 0000:8D:00.0 N/A | N/A |
| 41% 50C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 16 GeForce GTX 660 Ti On | 0000:8E:00.0 N/A | N/A |
| 30% 31C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 17 GeForce GTX 660 Ti On | 0000:8F:00.0 N/A | N/A |
| 30% 32C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
| 2 Not Supported |
| 3 Not Supported |
| 4 Not Supported |
| 5 Not Supported |
| 6 Not Supported |
| 7 Not Supported |
| 8 Not Supported |
| 9 Not Supported |
| 10 Not Supported |
| 11 Not Supported |
| 12 Not Supported |
| 13 Not Supported |
| 14 Not Supported |
| 15 Not Supported |
| 16 Not Supported |
| 17 Not Supported |
+-----------------------------------------------------------------------------+
root@server:~# deviceQuery | head -n39
deviceQuery Starting...
CUDA Device Query (Driver API) statically linked version
Detected 18 CUDA Capable device(s)
Device 0: "GeForce GTX 660 Ti"
CUDA Driver Version: 5.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores
GPU Clock rate: 1084 MHz (1.08 GHz)
Memory Clock rate: 3104 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 393216 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 17 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
root@server:~# deviceQuery | grep "Device [0-9]"
Device 0: "GeForce GTX 660 Ti"
Device 1: "GeForce 9800 GX2"
Device 2: "GeForce GTX 295"
Device 3: "GeForce GTX 295"
Device 4: "GeForce GTX 295"
Device 5: "GeForce GTX 295"
Device 6: "GeForce 9800 GX2"
Device 7: "GeForce 9800 GX2"
Device 8: "GeForce 9800 GX2"
Device 9: "GeForce GTX 295"
Device 10: "GeForce GTX 295"
Device 11: "GeForce GTX 660 Ti"
Device 12: "GeForce GTX 295"
Device 13: "GeForce GTX 295"
Device 14: "GeForce GTX 295"
Device 15: "GeForce GTX 295"
Device 16: "GeForce GTX 660 Ti"
Device 17: "GeForce GTX 660 Ti"
root@server:~# ./nbody --device=17 --numbodies=65536 --benchmark | tail -n 6
gpuDeviceInit() CUDA Device [17]: "GeForce GTX 660 Ti
> Compute 3.0 CUDA device: [GeForce GTX 660 Ti]
number of bodies = 65536
65536 bodies, total time for 10 iterations: 693.433 ms
= 61.938 billion interactions per second
= 1238.754 single-precision GFLOP/s at 20 flops per interaction
root@server:~# bandwidthTest --device=17
[CUDA Bandwidth Test] - Starting...
Running on...
Device 17: GeForce GTX 660 Ti
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 381.1
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 398.5
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 115168.9
Result = PASS