CUDA & OpenGL Interop memory copy question

Hi,

I just did a test with the Simple OpenGL sample in the SDK on a system with two GPUs - Quadro FX 4600 & GeForce GT 220.
Only one monitor is attached to either one of the GPU at any given time.

Test Case 1

Quadro was selected with cudaGLSetGLDevice().
Monitor is attached to Quadro.
Result frame rate: 60 fps.

Test Case 2

Quadro was selected with cudaGLSetGLDevice().
Monitor is attached to GeForce.
Result frame rate: 400 fps.

Test Case 3

GeForce was selected with cudaGLSetGLDevice().
Monitor is attached to Quadro.
Result frame rate: 60 fps.

Test Case 4

GeForce was selected with cudaGLSetGLDevice().
Monitor is attached to GeForce.
Result frame rate: 400 fps.

Now, interesting results.

This makes me wonder how the memory was allocated on the devices.
And I came up with this hypothesis:

  1. OpenGL memory are allocated on all devices.
  2. CUDA result is written to all memory allocated on different GPU at the same time.
  3. The GPU connected to the monitor renders on the screen.

Someone please help me verify this.
Is the above correct or I am very wrong?

Thank you.