So I have two Quadro’s in a machine, each with two monitors attached, and I’m rendering a single scene across the 4 monitors, in a single OpenGL context / viewport / whatever. I want to implement a particle system in CUDA by mapping OpenGL vertex buffers and modifying the vertices each frame. I just realized that with the runtime API, at least, I have to specify which CUDA device (aka which Quadro?) I want to use with OpenGL. But within OpenGL itself there is no distinction between devices, so how exactly could / should / would this work? For example, if one device simulates all the particles, will the vertex buffer be updated on both devices (presumably it would) but would this hurt my bus bandwidth? Could both GPUs simulate all the particles this avoiding bus utilization? Would I need to have two OpenGL contexts, one per graphics card, for that to work?
Currently for multiple GPU systems I would recommend not using OpenGL interop, and instead just read back the data to the host (this is what the driver has to do in this case anyway). You could also use asynchronous transfers and double buffering to hide the transfer cost.
BTW, the GPU affinity extension does allow you to control which GPU an OpenGL context gets created on:
This is not how it works with 2.2 if the display card is a Quadro. Instead of going to the host, it uses (dare I say it?) peer to peer transfers and is much faster than before…
Is there a way to benchmark this? I have a FX4800 and a C1060.
OGL interop between multiple devices is a lot faster in 2.2 than before? I don’t know, I’m not an OGL guy.
Hehe, maybe I can test something later this week. Later today I get a delivery of 20 PC’s with GTX285, so I can swap out the FX4800 and test the difference for kicks.