Quite a general question. I currently have a program running in CUDA 2.3 that makes use of OpenGL interperobility by the following steps:
- Create a pbo
- Render the scene
- Use glReadPixels
- Map/Register pbo to CUDA
- Run CUDA kernel
- Unmap/register pbo
I repeat this process many many times. I’m not sure if this is the best way of doing things however (readPixels is slow, as is the mapping/unmapping). I’m aware that CUDA 3.0 has changed the way interperobility works, and I hear using textures is better than using readPixels? Can anyone offer me a quick step by step guide similar to the above that will give me the fastest way to render a scene with OpenGL and then do something with the results in CUDA, and possibly some links to examples?
How fast do you need it to be, 30 fps, 60 fps, >500 fps? I use the map/unmap chunk from the image denoising example and can manage some goodfps of around 300-400 fps on a GTX275. That example also uses textures (or maybe not, but I do, with a PBO).
edit: instead of using glReadPixels, I use a memcpy from my device memory (where the image is permanently stored) to the PBO, then to OGL to display.
I think the image denoiser is working differently. To me it seems to load an image, use CUDA to modify it, then OpenGL to display the result. I’m working the other way round. I wish to start with some scene data, render the scene using OpenGL, then use CUDA to do some work with the resulting image held in the framebuffer.
I tried running the fluidsGL but added some code to get the mapping timings. Mapping took 1.47ms on average, with unmapping 0.98ms.
(posted twice by mistake)