Multi-GPU + Interoperability issues

Hey there,

I work for a medical tech company where we’re developing a surgery product that relies heavily on NVIDIA products. We use 3D Vision for stereo rendering. We use GTX cards to power our graphics as well as a custom physics engine running on CUDA. We use a set of haptic devices that interact with the physics engine (both ways).

Our framework basically consists of two GPU parts:

  • Graphics. High load. We use DirectX 10 to render massive point models (similar to NVIDIA Screen-Space Fluids).
  • Physics. High load. This is implemented using CUDA 3.2. Because we require interaction with our haptic devices, which includes calculating a feedback force, this is generally running at 1000Hz and above.

(Physics calls some interoperability once in a while to update some DirectX vertex buffers so the user can see what’s going on. We use some kind of double buffering for this.)

Our setup currently consists of an i7 board with dual NVIDIA GTX480’s. Our previous setup was running a single GTX480, where we noticed that as we put more load (heavier shaders) on the DirectX rendering, the CUDA physics engine would suffer. Of course this made sense as both the graphics and physics had to be shared on the same card. So we upgraded to two cards, hoping to offload the physics into its own CUDA context and thread on the second card while keeping the DirectX rendering on the first.

Unfortunately, this doesn’t seem as easy as it sounds. I currently have the physics running in its own CUDA context (and thread) on the secondary card, but it’s still suffering when I increase the load on DirectX. More specifically, once the rendering framerate starts dropping below vsync, CUDA performance drops equally. I’ve been checking the GPU usage by using a monitor (MSI Afterburner) and I can see that the usage on the primary card is getting maxed out, while the secondary card running CUDA is far from maxed out. Though, if the load on DirectX drops, this does turn around.

Perhaps I’m taking a wrong approach here? What I’ve noticed is that when I put an extreme load on DirectX with our application in windowed mode, Windows becomes fairly unresponsive. Verified this on a different configuration as well, so this may very well be a logical consequence of how DirectX operates. The question is, how do I implement this properly? Or is it even possible at all?