I’ve written some GL code that displays a scene that I’d like to read from CUDA, however I don’t want the scene to be displayed on my screen. From what I gather I need to write to a GL_AUX buffer, and then read from that in my CUDA device code. How do I go about telling GL/GLUT that I want to rasterize to a GL_AUX buffer instead of a display buffer?
OK. I’ve worked out how to do this. Before my display code I call
glDrawBuffer(GL_AUX0);
. And then my scene is rasterized correctly to this buffer (as confirmed by
glReadBuffer(GL_AUX0); glReadPixels(...);
.
However, I can only get this to work after I call the glut init code, which involves creating a window. I don’t want the window to pop up, I’d just like my code to rasterize to the aux buffer and then have my CUDA code perform calculations on the buffer which I can then dump to host memory. How do I achieve this?
int main (int argc, char **argv)
{
init();
load_model();
display(); // Writes to GL_AUX0
cuda_process(); // Reads from GL_AUX0 and writes results to host process once complete
write_results(); // Reads from host memory and writes results to disk or somesuch
}
These days I would recommend using the frame buffer object (FBO) extension for off-screen rendering rather than AUX buffers.
I don’t think there is any way with GLUT to create an OpenGL context without creating a window, so you’ll have to write your own initialization code using WGL or GLX.
Hmm. I spent quite some time setting up a PBO (ala the postProcessGL example) and I find no significant speedup compared with my old technique. My old technique was simply to raster to GL_AUX and use glReadPixels from the framebuffer to a host array, and then memcpy the host array to the device. This seems very odd, but I’ve seen a few other forum posts hinting at similar issues. Some suggest waiting for CUDA 2.0.
What is happening here? I would have figured that copying from a framebuffer (on the device) to global memory (on the device) would be orders of magnitude faster than framebuffer → host → device.