CUDA Multi-GPU with OpenGL interop

sriramkashyap · January 6, 2010, 7:37am

Hi,

I currently have a program that uses CUDA to generate an image.
This image is rendered using opengl interoperability (using a pbo)

I want to move this to a dual GPU setting, where half the image is calculated by each GPU
(in general, we would have small parts of the image computed by several gpus)

I want to know the fastest way of displaying these ‘pieces’ of the image as one full image, using OpenGL.

Specifically, I am wondering if it is possible to somehow create one opengl context and one cuda context on each GPU, and render the partial images together on a single window.
Also, if I use a single OpenGL context, is it possible to use opengl interop where opengl context is on one gpu and the cuda program is running on a different GPU??

(any other possible way of doing this is welcome)

Sarnath · January 6, 2010, 9:42am

I dont know much about openGL…

But here is a logical question… Can OpenGL create context spread over multiple-GPUs? OR Is an OpenGL context tied to one GPU? OpenGL spec might have an answer.

Simon_Green · January 6, 2010, 2:37pm

We do support fast CUDA-graphics interop across GPUs when the rendering GPU is a Quadro.

But unless you are generating very large images, for simplicity I would recommend just transferring the image data back to the CPU for display. If you overlap the transfer with the computation (displaying the previous frame), the performance should be fine.

BTW, you can control which GPU an OpenGL context gets created on using the GPU affinity extension (this is only supported on Quadro):
[url=“http://developer.download.nvidia.com/opengl/specs/WGL_NV_gpu_affinity.txt”]http://developer.download.nvidia.com/openg...pu_affinity.txt[/url]

sriramkashyap · January 7, 2010, 6:09am

Thanks.

Does cuda-graphics interop across gpus work on regular gpus (even if it is slower)?

Specifically, I use a gtx295, and each gpu has one half of the image.

the image sizes are 1024x1024, and i didn’t want to copy the buffer to cpu and back to gpu, because of the framerate hit that we would incur.

Regarding the interleaving suggestion, i will check if we can do async copy.

We will have to prevent the new computation from writing values to the buffer until the old values are fully copied. How can I achieve this?

Smokey · September 22, 2010, 5:13am

I second this question…

I’m going to assume nVidia clearly doesn’t support CUDA/GL interop between non-nVidia-GL contexts and CUDA contexts…

BUT, is CUDA/GL interop between two (different) nVidia (CUDA-capable) devices supposed to be supported? Because what I’m currently seeing tends to suggest otherwise (I get CUDA errors trying to register a GL resources created by a different nVidia (CUDA-capable) GPU).

Smokey · September 22, 2010, 5:13am

I second this question…

I’m going to assume nVidia clearly doesn’t support CUDA/GL interop between non-nVidia-GL contexts and CUDA contexts…

BUT, is CUDA/GL interop between two (different) nVidia (CUDA-capable) devices supposed to be supported? Because what I’m currently seeing tends to suggest otherwise (I get CUDA errors trying to register a GL resources created by a different nVidia (CUDA-capable) GPU).

Smokey · October 11, 2010, 1:40am

Bump…

Smokey · October 11, 2010, 1:40am

Bump…

zeus13i · December 13, 2010, 4:16am

/bump

I, too, want to know more about this.
I’m rendering an OpenGL VBO which I would like to modify different parts of in different cuda threads/gpu’s.