I have a beginner question about using the cuda samples on a multi gpu system. Does storing arrays as VBOs mean that the code will only efficiently run on the first device? If I want to write code that will run on multiple GPUs without graphics does using a VBO mean data has to be continually moved between devices? Any info appreciated
Yes, the graphics interoperability features in CUDA are currently only high performance if the same GPU is used for the computation and display. If you’re using multiple GPUs you might as well transfer the data back to the host and render from there (since this is what the driver has to do internally anyway).
Note that the nbody demo gets a higher framerate/GFLOPS when I calculate on C1060 and display on FX4800, than running everything on the FX4800 (at least it did before Dell replaced half my PC, did not test since). So ‘only high performance’ is maybe a bit too harsh?
I meant that the interoperability functions themselves aren’t optimal in this case - obviously if your rendering is relatively expensive (like in the n-body demo), you probably will get a benefit from moving it to another GPU and leaving the other GPU to concentrate on the computation.