Can we R/W with cuda 2.1 from one GPU RAM to another?


I got a GTX 295 with two GPU, and I would like to know some stuff about programming in cuda 2.1 with such device:

  • Can we access to the second GPU Ram from a kernel on the first GPU?
  • Can we copy buffer from the first GPU Ram to the second one?
  • Should we create a new thread to launch kernel on another GPU?

Thank you



Not directly. You have to copy the data from device #1 to host, then copy host to device #2. Moreover, in CUDA 2.1 there is no way to mark a block of memory as page-locked for two host threads, which is required to access two GPUs at the same time. This meant either the device1->host or host->device2 copy would be slow. CUDA 2.2 fixes this, and allows multiple host threads to share page-locked host memory. (See next question also.)

Yes, this is pretty much the only way to do it. A given host thread can only be associated with one CUDA device at a time. There is a handy C++ class called GPUWorker which handles the host threads for you:

Interesting… is there any possibility of using something like SLI for passing data directly between GPUs, or is this not an avenue CUDA devs would be interested in?

Comments from NVIDIA employees in the past suggest the SLI link is actually not that fast. PCI-Express, however, is designed to allow devices to directly communicate with each other, so in principle a GPU-to-GPU copy could be done over that link at 3 or 6 GB/sec. This capability is not present in CUDA yet, though people have asked for it.