Transfer data from PCIe device to GPU memory? Why not incorporate the SLI bridge into CUDA?

Hi, I have read the FAQ and numerous posts to these forums. I admit that I am new to CUDA and SLI. This is not necessarily a gaming question but one related to parallel processing.

In the faq, there is a question that was asked: “Is it possible to DMA directly into GPU memory from another PCI device”? The answer was “no, but we are working on ways to do that”.

Also, about two years ago, someone on some CUDA forum asked if it was possible to trasfer data from one GPU to another over the SLI bridge, with CUDA. Again, the answer was “no, but we are looking at doing that in future releases of CUDA”.

Clearly, the SLI bridge interface is proprietary, as far as I can tell. I have even found the patent that NVIDA has filed concerning the SLI bridge and it says basically nothing about the interface other than that it exists.

It seems to me that many problems with data transfer in parallel processing can be addressed by using the SLI bridge between graphics cards and other PCIe cards that would be able to exist if they could connect to the SLI bridge.

Why not make the SLI bridge interface available to CUDA users?

Thank You
Tom

We’ve asked about this, and the not-very direct response has been that the SLI bridge is not as high bandwidth as you think. (It’s not even clear how much data goes over it.) A point-to-point copy over the PCI-Express bus between devices would be much faster.

I am not sure, because for SLI gaming benchmarks are faster with SLI bridge.

When games are running in SLI mode, I’m pretty sure both gpus have complete copies of everything (textures, geometry, etc…) and they just render half the screen (interleaved or top-bottom or maybe some other way). This data (half the screen) would have to get transfered over the link so that the complete screen image would exist in the framebuffer of one gpu. So assuming a resolution of 1920 x 1080, 60Hz, 32bits per pixel - that’s about 256Mb/sec. So significantly slower than the PCIe bus.

Ok, but when I did my tests - bencharks were better with SLI bridge in comparision without bridge, so your caculations must have some error.

You can run SLI games without the bridge? How much difference is there? (Again, this is no guarantee that lots of information is passing over the bridge.)

Yes you can have it without bridge and it goes through pcie, difference is negligible, but with bridge it is faster.
I do not posses knowledge how it works …

OK, if the difference is negligible, then I still don’t hold out any hope for the SLI bridge being very helpful for data transfer.

The logical guess is that the SLI bridge gives a low bandwidth but low and fixed latency connection between the GPUs. That’s important for graphics rendering since you need to synchronize frame data at a reliable realtime rate. The PCIe bus has good bandwidth on average but no latency guarantees.

I’d concur with the low bandwidth but low latency theory. If I was identifying the cable correctly (and I’m going back a couple of years), the SLI bridge contains about eight wires. That’s obviously rather less than a PCIe x16 slot. In addition to the low and fixed latency, the signaling protocol can also be optimised for exactly the sort of things two NVIDIA GPUs require for graphics, rather than being fully general. I’m speculating, but that could make the SLI bridge bandwidth effectively higher, but only for graphics.

I think that sli bridge must have high throughput, because e.g. I have GTX295 wich is two cards in one package and second card is connected only with sli bridge to first one which is plugged in PCIE.

How do you know that?

[url=“http://www.techpowerup.com/reviews/Zotac/GeForce_GTX_295/images/disasm12.jpg”]http://www.techpowerup.com/reviews/Zotac/G...es/disasm12.jpg[/url]

there is only sli bridge, nothing else, and this card is very fast
it would be interesting if somebody is capable of explaining it

Interesting. Does this really show as 2 CUDA devices if SLI is switched off? And do both devices achieve similar DMA speeds?

What is interesting is that CUDA shows 2 devices ONLY if SLI is activated (in Windows).
They have similar DMA speeds.

OK, this is very curious.

We have been told that the two halves of a GTX 295 share the PCI-Express link through NVIDIA’s NF200 PCI-Express switch, and they show up as separate PCI-Express devices to the operating system. (Not going to comment on Windows hiding the device or not since I don’t know anything about those drivers.) An x16 PCI-Express link normally requires 64 wires. (16 lanes * 2 [send/receive] * 2 [differential voltage pairs] = 64)

If you look at a normal NVIDIA SLI bridge (not the flat cable in the disassembled picture above), you will find that there are 26 wires total. However, if you go to the techpowerup.com page and look at some of the other photos, they have one at the end where they seem to have removed the flat cables connecting the two PCBs together:

[url=“http://www.techpowerup.com/reviews/Zotac/GeForce_GTX_295/images/screwed.jpg”]http://www.techpowerup.com/reviews/Zotac/G...ges/screwed.jpg[/url]

It’s impossible to tell how many pins there are (and there appear to be two cables stacked on each other!), but probably more than 26, and perhaps even 64. Although the techpowerup article calls the connection an “SLI bridge”, I think that was just an assumption with no clear evidence to support it.

Anyway, this is all speculation. NVIDIA has never expressed any interest in giving anyone access to the SLI link, whatever technology it turns out to be, because they clearly want to keep in proprietary. Sleuthing out the exact capabilities is probably not going to change their mind. :)