Integrating CUDA with existing OpenGL apps

Hi,

While researching geometry shaders, I stumbled across CUDA and have become transfixed on it. I have a dual 8800 setup, and after modding the code slightly to not check for interoperability (what is this anyway?) conflicts and after setting my system to use SLI, I was able to run the demo applications.

While running cool demos is sometimes mesmerizing to watch (fluidGL for instance), I really need to know if I can “easily” add CUDA support to an existing OpenGL application.

My application as it stands uses FBOs to render a main 3D scene to texture and additional FBOs are used to perform various 2D image enhancement to the original rendered texture. For instance, the first FBO would contain the scene. The next FBO would have a shader attached that would take the first FBO’s texture and do a gray scale on it. The next FBO would have a shader attached that would add noise to the image. And so on.

Because our current system uses floating point textures, there are only so many FBOs that we can set up before we start seeing drops in performance; and due to the various restrictions (for lack of a better word) that fragment shaders impose, some of the effects require multiple passes just to get prepared for rendering.

So, my basic question is “can I set up a frame buffer object in a traditional OpenGL application (on WindowsXP) that can be manipulated by CUDA?” I’m a little wary because the demos that I have looked at are compiled via NVIDIA’s C compiler, but it doesn’t make a lot of sense as to why NVIDIA would not allow an existing application to simply add CUDA support to it.

Has anyone done something similar to this already? Are there demos or tutorials around that can help me? I’ve got the CUDA programming guide and am looking through it, but so far I haven’t found an answer. Any help would be appreciated.

Thanks,
Brian

I can’t don’t know anything about FBO’s, but it is easy to integrate CUDA into any application. Only the .cu files containing kernels and thin wrappers to call those kernels are needed (declaring the thin wrappers extern “C”) to be compiled by nvcc. Just link the object files generated by nvcc to the rest of your application , no problem.

You do need to be aware of threading issues, though (discussed in the guide). One thread = one CUDA context, and allocated memory spaces are different between contexts. So that means you can’t allocate CUDA memory in one thread and use it in another.