I was wondering if anyone has experimented with using the SLI bridge and OpenGL to transfer data from 1 GPU to a 2nd GPU then interoperation back to CUDA or OpenCL? This seems like a possible way to transfer data throughout 2 GPU’s instead of going from GPU A back to host memory than host memory to GPU B.
NVIDIA doesn’t expose any API for third-parties to use the SLI bridge manually. Also, there is a lot of suspicion that this bridge is not a high-bandwidth link, and that large chunks of data are exchanged over the PCI-Express bus rather than the bridge itself.
Yes, seibert’s right. It’s all undocumented, but previous discussions here pretty much resolved that the SLI hardware link is low bandwidth but low and predictable latency… useful for synchronizing timing more than data. That’s important for realtime graphics.
Though, Mesa, you brought up an interesting question… is there a way in OpenGL to transfer data from one GPU to another via DMA (not via host memory?)
NVIDIA’s SLI FAQ mentions that the SLI bridge provides 1Gbps dedicated bandwidth:
Interesting! Though that document actually says 1GB/sec (8x higher than 1 Gbps), which is pretty respectable. The PCI-E bus is still much faster at 5GB/sec or more, but it is interesting to see that rendered pixel data is communicated over this separate channel.
Would be be interesting if it were exposed in the CUDA API.
For example, i think in some dual Xeon machines (with 8 gpu slots) there can be multiple PCI-e host controllers, preventing some GPU’s from communicating with each other directly.
If we could communicate between GPU’s through SLI bridges, we could patch the shortcomings of any motherboard ourselves.
Perhaps it could even work transparently, routing through an SLI bridge if no (better) path through PCI-e from one device to another can be found.
This is an interesting concept that I also looked into some months ago. More and more I feel that PCIE is going to become the big bottleneck on HPC. I wonder if NVidia will look into developing some high-bandwidth p2p addons to their card (similar to SLI but more heavy duty bandwidth specs). My current code spends half of its time transferring data between gpus (This is simple conjugate gradient too, not an unreasonable problem to solve…)
PCI-E is a tough interconnect to beat. With version 3.0, the theoretical bandwidth of an x16 slot is just under 16 GB/sec. Assuming practical efficiencies similar to PCI-E 2.0, you ought to be able to move 10-12 GB/sec through a PCI-E 3.0 interface. For a 64-wire interface, that’s not bad. A single 64-bit DDR3-1600 bus can transfer 12.8 GB/sec theoretically.
It’s a lot more cost effective to make sure the firmware and software people don’t screw up the utility of the PCI-E interface with BIOS bugs, or other limitations.
I should say, the one place where NVIDIA could win with a custom HPC interface is low, predictable latency, which is presumably why the 1 GB/sec SLI pixel interface exists. Certainly having something like HyperTransport between GPUs would be really interesting (and extremely expensive).