Newbie question about multiGPU Several issues with novice user

I have two Quadro 5600 in my machine so that i can try some thing with multiple GPU. I have some questions concerning about that

I have a program that run perfectly with one card, it includes multiple kernels, and use some kind of global memory like texture and constant, that normally i define:

texture<float, 2> tex;
device constant int d_extInfo[16]


  • In my application i want to run program separately on each CUDA card i have, that mean tex and d_extInfo should be defined locally for each device. How can I do that

  • There’s a common input of the two, how can i allocate in CUDA that so both GPU can access it. Or i have to make two copies of that common input

  • After run program on each GPU, I want to combine the results, in that case i should save the result some where that both GPU can read. Can i store results in GPU, or should i store in CPU and combine results in CPU. I would like to store in GPU so that i can exploit my fast GPU combine function

  • Is there any multi-gpu program that show how to transfer data between GPUs. The multigpu sample in SDK is too simple that i don’t find it good enough to understand what happen

Thank you

You have to use cudaSetDevice() to select the device you want to work with. The best way to handle this is to have a CPU thread for each device and bind each thread to a CUDA device/CUDA device context with cudaSetDevice()

  1. You define it at file scope (as you do right now). For each device context (for each GPU) the symbols will represent memory on the current context’s device.
  2. You have to upload your input data to both devices.
  3. You have to download data from one GPU, upload it to the other and do your combination.
  4. As far as I know you cannot transfer data between GPUs directly. Pointers to device memory of different devices can not be shared between CUDA contexts.

Thank you. Your answer is quite clear to me and directly to my problems. It seems to me that CPU is the only bridge between multiple GPUs, i think it should b better if we have common space for GPUs rather than CPU memory because the bandwidth is limited. Should it be feature that provided in future GPUs when people want to perform multiple GPU computing , or GPU network rather than one standalone