Using OpenGL fragment output in Cuda


I have an OpenGL GPGPU-application where I want to migrate some of the passes to Cuda. As far as I can see from the documentation and the extension specification, the way to transfer data to OpenGL from Cuda is using buffer objects and the new texture buffer object extension. However, I’m a bit perplexed about how to transfer data the other way, since Cuda only can borrow buffers from OpenGL, I must somehow render to a buffer object. One approach is of course to render to a texture and the pixel buffer object to copy the contents of the texture into a buffer (i.e. how one would do render-to-vertex-buffer). However, this implies a copy. Another approach is to use transform feedback, performing all the computations in the vertex shader, but I’m a bit unsure on the performance of this…

Have I missed something here, or isn’t it a way of getting GL fragment output into Cuda without a copy?


The driver would likely have to do a copy anyway due to possible memory format differences between OpenGL render targets and CUDA arrays. But the copy is GPU memory to GPU memory, which is really fast.

I think you need to render to an FBO and then do a readPixels into a PBO, then bind that PBO to a CUDA pointer.

We will have an example of this in a future release of the CUDA SDK. Let me know if you need more help getting it working.


Hi Mark,

Thanks for your reply. I’ve tried this with GL on pre-G80 cards, trying to offload the vertex shaders with the fragment shader, and it the copy had a large impact. Maybe it is cheaper now, I’ll give it a try. :)

I guess cuda and GL sharing textures would solve the problem, hopefully a future cuda will provide this. :)