CUDA and OpenGL interoperability

Hello,

For testing purposes I managed to combine some of my codes with the program wrote for the chapter 9 of this https://www.amazon.com/CUDA-Application-Design-Development-Farber/dp/0123884268

The code makes some calculations and then it plots in a window using opengl and glut libraries. When combining cuda and opengl one has to specify the gpu using the cudaGLSetGLDevice(); function. When there is only one card there is nothing to do, however I have two cards in my computer and sometime people use the main card (which uses X) to make calculations. In this case I would like to do my calculations on the other card, but still make my plots.
How can I select this on a two or more gpu system? I would like to be able to do the cuda part on arbitrary gpu in the system and then plot the results.

It seems that the answer to this is just to use cudaGLSetGLDevice(n); and it will work without problem. the library will divide properly the work. The GL part on the main card and the cuda calculations on the device n.

It’s been a while since I’ve done it, but I believe that should be correct. There are a couple presentations that may be of interest:

http://on-demand.gputechconf.com/gtc/2012/presentations/S0267A-Mixing-Graphics-and-Compute-with-Multiple-GPUs-Part-A.pdf
http://on-demand.gputechconf.com/gtc/2012/presentations/S0267B-Mixing-Graphics-and-Compute-with-Multiple-GPUs-Part-B.pdf

Some of the takeaways:

“The driver will do all the heavy lifting but…” (this assumes both GPUs are NVIDIA GPUs of appropriate compatibility)

“CUDA-OpenGL interop will perform slower if OpenGL context spans multiple GPU!”

Thanks for the reply. I will check the presentations. I tried on a computer with two 1080 cards.

By setting the device with cudasetGLseGLdevice the program seems to work without problem. The speed is not an issue as lng as it is not changing by a factor of 10. In my case it seems that fps stayed same.

This is the nvidia-smi output, for my program ./GLDemo:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1145    G   /usr/lib/xorg/Xorg                             171MiB |
|    0      1725    G   compiz                                         113MiB |
|    0     16887  C+G   ./GLDemo                                        14MiB |
|    1     14421    C   gmx                                            131MiB |
|    1     16887    C   ./GLDemo                                       121MiB |
+-----------------------------------------------------------------------------+

I suppose in this case the math was done on GPU 1, while the plotting on GPU 0.

In the second case I got this:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1145    G   /usr/lib/xorg/Xorg                             177MiB |
|    0      1725    G   compiz                                         113MiB |
|    0     17039  C+G   ./GLDemo                                       120MiB |
|    1     14421    C   gmx                                            131MiB |
+-----------------------------------------------------------------------------+

Everything was done on the GPU 0.

I only to the plotting for the show off and testing time. For production it will not be feasible anyway, becuase we are going to do huge systems size wich would result in 1 frame at many seconds.