readPixels performance

Hi all,

glReadPixels seems to be running quite slowly for me.

void display()

{

  glClearColor(1, 1, 1, 1);

  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );

  glLoadIdentity();

glGenBuffers(1, &pbo);

  glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);

  glBufferData(GL_PIXEL_PACK_BUFFER, 768*256*sizeof(float), NULL, GL_STREAM_READ);

glReadPixels(0, 0, 768, 256, GL_RED, GL_FLOAT, NULL);

  ...

The only other code in here is a variable which I print to the screen to see how many frames have been rendered. display() is called by the idle function. If I comment out readPixels then I get a very very big number very very fast. With readPixels however I get about 300 renders per second (it’s a primitive way of testing this but it’ll do for now).

Basically I want to render something to the screen, then using readPixels transfer the data to a PBO which I can map to CUDA. I can’t help but feel readPixels should be much faster than this? Or is there a better way to tackle this problem?

It may be that since you’re only reading the red channel, you’re forcing the driver to read back the image to the cpu so that it can be reformatted before being transferred to the pbo. I believe a glReadPixels call like this will only map to a direct copy in video memory if the format of the readpixels and the pbo match.

Regardless of this, we’re still working on improving the OpenGL interop performance.

Hmm, yes. Reading in GL_BGRA on an equivilently sized buffer is faster, but still not near as fast as I would like.

When rendering I don’t want to render a colour, I wish to render a value. Say I have 3 objects, Object 1, Object 2, and Object 3. When rendering I wish to place their object number in the pixel buffer, rather than their colour. Then get all this data into CUDA. What’s the best way of going about this?