using GL_AUX

I’ve written some GL code that displays a scene that I’d like to read from CUDA, however I don’t want the scene to be displayed on my screen. From what I gather I need to write to a GL_AUX buffer, and then read from that in my CUDA device code. How do I go about telling GL/GLUT that I want to rasterize to a GL_AUX buffer instead of a display buffer?


OK. I’ve worked out how to do this. Before my display code I call


. And then my scene is rasterized correctly to this buffer (as confirmed by

glReadBuffer(GL_AUX0); glReadPixels(...);


However, I can only get this to work after I call the glut init code, which involves creating a window. I don’t want the window to pop up, I’d just like my code to rasterize to the aux buffer and then have my CUDA code perform calculations on the buffer which I can then dump to host memory. How do I achieve this?

This is what my main looks like:

int main(int argc, char **argv)


    glutInit(&argc, argv);

   filepath = argv[1];

   glutInitDisplayMode(GLUT_DEPTH | GLUT_RGB | GLUT_DOUBLE);

    glutInitWindowSize(256, 256);








   return (0);


Ideally it would look something like this:

int main (int argc, char **argv)




    display();    // Writes to GL_AUX0

    cuda_process(); // Reads from GL_AUX0 and writes results to host process once complete

    write_results(); // Reads from host memory and writes results to disk or somesuch


These days I would recommend using the frame buffer object (FBO) extension for off-screen rendering rather than AUX buffers.

I don’t think there is any way with GLUT to create an OpenGL context without creating a window, so you’ll have to write your own initialization code using WGL or GLX.

Hmm. I spent quite some time setting up a PBO (ala the postProcessGL example) and I find no significant speedup compared with my old technique. My old technique was simply to raster to GL_AUX and use glReadPixels from the framebuffer to a host array, and then memcpy the host array to the device. This seems very odd, but I’ve seen a few other forum posts hinting at similar issues. Some suggest waiting for CUDA 2.0.

What is happening here? I would have figured that copying from a framebuffer (on the device) to global memory (on the device) would be orders of magnitude faster than framebuffer → host → device.