OpenGL performance issue. glReadPixels and cudaGLMapBufferObject bad performance.


currently, I am working on rendering an object with OpenGL and post process this image.

My images’ size is 640x480 bytes, greyscale.

As glReadPixels was very slow when transferring data from the GPU to local memory (2.4ms), I decided to post progress the OpenGL rendered image direcly with CUDA.

I am surprised at an even worse performance on the GPU with glReadPixels (2.4ms) and cudaGLMapBufferObject (1ms).

Although my kernel code performs very well :), glReadPixels and cudaGLMapBufferObject seems to be a bottleneck?

Perhaps I have some mistakes in my code?

Thank you!

renderObjectWithOpenGL(...);   //0.05ms-3ms (depends on object)

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, bufferIDx);	//negligible amount of time

glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 640*480, NULL, GL_STREAM_READ);   //negligible amount of time

cudaGLRegisterBufferObject(bufferIDx);	//0.4ms	

glReadPixels(0, 0, 640, 480, GL_LUMINANCE, GL_UNSIGNED_BYTE, 0);   //2.4ms

cudaGLMapBufferObject( (void**)&in_data, bufferIDx);		//1ms	


launch_kernel(in_data, d_result);	




cudaGLUnmapBufferObject( bufferIDx);   //0.4ms

cudaGLUnregisterBufferObject(bufferIDx);   //0.3ms


cudaMemcpy( resultcuda, d_result, blub*sizeof(float), cudaMemcpyDeviceToHost); 

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, 0);		//negligible amount of time


i am also having troubles with the performance of cudaGLMapBufferObject. did you find a solution in the meantime?

  • simon

First of all, did you use cudaGLSetGLDevice to set the device? (should be called after OpenGL was initialized if I recall correctly) to sepecify that the cuda context will use OpenGL interoperability
Second try using the GL_DYNAMIC_COPY_ARB flag instead of GL_STREAM_READ. The flags make a difference as the hint on where the buffer should be created. If you are not careful then glReadPixels will copy the data via the host instead of leaving it on the device and/or will make sure to cache the data in some cases. Correct flags can make a very big difference here.