Memory on SLI and CUDA How the combined memmory is presented?

Hello. I am analysing the possibility of porting a “HUGE images” processing system to CUDA. I would like to know how the combined memory of a SLI system is presented . For example I have 2 cards with 1 GB memory, can I split my data set of 2 GB between the 2? In other words, Do I have a grand total of 2 GB to use?

Not with SLI. Actually, right now, you’ll just see one 1GB device. Basically, the bandwidth between SLI’d cards is not high enough to transparently scale, so you have to treat them as completely separate devices.

But then can I run 1 program at one of the cards and another (lets suppose same code) at the other card? The algorithm I use do not have data dependencies between the partitions therefore the speed of sli for MY algorithm is irrelevant. I just need to be able to put half the problem in one card and half in other (but the data is 100% different) So I can finish the process in about half of the time.

Othwerwise what is the advantage of having several GPUs or Teslas ?

Of course you can do this. Either run two different instances of your application and call cudaSetDevice differently in each, or run two host threads in one application again setting cudaSetDeviceDifferently in each.

tmuarray was just trying to tell you that SLI isn’t a magical fairy that makes two CUDA devices look like one. In fact, you’ll have to disable SLI to get the two CUDA devices to show up as SLI works at a very low level in the driver. You can even run multiple CUDA devices on motherboard chipsets that don’t support NVIDIA SLI!

If you have SLI off, you’ll see different devices, and you can do anything you want on different devices.