OpenGL performance issue. glReadPixels and cudaGLMapBufferObject bad performance.

davem_de · February 10, 2010, 1:46pm

Hello,

currently, I am working on rendering an object with OpenGL and post process this image.

My images’ size is 640x480 bytes, greyscale.

As glReadPixels was very slow when transferring data from the GPU to local memory (2.4ms), I decided to post progress the OpenGL rendered image direcly with CUDA.

I am surprised at an even worse performance on the GPU with glReadPixels (2.4ms) and cudaGLMapBufferObject (1ms).

Although my kernel code performs very well :), glReadPixels and cudaGLMapBufferObject seems to be a bottleneck?

Perhaps I have some mistakes in my code?

Thank you!

renderObjectWithOpenGL(...);   //0.05ms-3ms (depends on object)

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, bufferIDx);	//negligible amount of time

glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 640*480, NULL, GL_STREAM_READ);   //negligible amount of time

cudaGLRegisterBufferObject(bufferIDx);	//0.4ms	

glReadPixels(0, 0, 640, 480, GL_LUMINANCE, GL_UNSIGNED_BYTE, 0);   //2.4ms

cudaGLMapBufferObject( (void**)&in_data, bufferIDx);		//1ms	

		

launch_kernel(in_data, d_result);	

	

cudaThreadSynchronize();							

		

cudaGLUnmapBufferObject( bufferIDx);   //0.4ms

cudaGLUnregisterBufferObject(bufferIDx);   //0.3ms

	

cudaMemcpy( resultcuda, d_result, blub*sizeof(float), cudaMemcpyDeviceToHost); 

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, 0);		//negligible amount of time

shaegler · March 17, 2010, 10:52am

hi

i am also having troubles with the performance of cudaGLMapBufferObject. did you find a solution in the meantime?

simon

Hello,

currently, I am working on rendering an object with OpenGL and post process this image.

My images’ size is 640x480 bytes, greyscale.

As glReadPixels was very slow when transferring data from the GPU to local memory (2.4ms), I decided to post progress the OpenGL rendered image direcly with CUDA.

I am surprised at an even worse performance on the GPU with glReadPixels (2.4ms) and cudaGLMapBufferObject (1ms).

Although my kernel code performs very well :), glReadPixels and cudaGLMapBufferObject seems to be a bottleneck?

Perhaps I have some mistakes in my code?

Thank you!
renderObjectWithOpenGL(...);   //0.05ms-3ms (depends on object)

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, bufferIDx);	//negligible amount of time

glBufferData(GL_PIXEL_PACK_BUFFER_ARB, 640*480, NULL, GL_STREAM_READ);   //negligible amount of time

cudaGLRegisterBufferObject(bufferIDx);	//0.4ms	

glReadPixels(0, 0, 640, 480, GL_LUMINANCE, GL_UNSIGNED_BYTE, 0);   //2.4ms

cudaGLMapBufferObject( (void**)&in_data, bufferIDx);		//1ms	

		

launch_kernel(in_data, d_result);	

	

cudaThreadSynchronize();							

		

cudaGLUnmapBufferObject( bufferIDx);   //0.4ms

cudaGLUnregisterBufferObject(bufferIDx);   //0.3ms

	

cudaMemcpy( resultcuda, d_result, blub*sizeof(float), cudaMemcpyDeviceToHost); 

glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, 0);		//negligible amount of time

laughingrice · March 24, 2010, 1:13am

First of all, did you use cudaGLSetGLDevice to set the device? (should be called after OpenGL was initialized if I recall correctly) to sepecify that the cuda context will use OpenGL interoperability
Second try using the GL_DYNAMIC_COPY_ARB flag instead of GL_STREAM_READ. The flags make a difference as the hint on where the buffer should be created. If you are not careful then glReadPixels will copy the data via the host instead of leaving it on the device and/or will make sure to cache the data in some cases. Correct flags can make a very big difference here.

Topic		Replies	Views
device->host->device copy vs cudaGLMapBufferObject 6vs9ms, shouldn't mapping be way faster CUDA Programming and Performance	0	4852	July 12, 2007
readPixels performance CUDA Programming and Performance	2	2073	December 1, 2008
cudaGLMapBufferObject (and unmap) performance These calls take way too long CUDA Programming and Performance	47	76671	February 14, 2010
PBO/glReadPixels/cudaGLMapBufferObject performance difference between vista and linux CUDA Programming and Performance	1	18342	January 12, 2010
A problem of CUDA & OpenGL interoperation CUDA Programming and Performance	4	4005	May 17, 2009
cudaGLMapBufferObject on 2.3 Is still Slow? CUDA Programming and Performance	2	1385	September 7, 2009
OPENGL Texture Object And CUDA kernel CUDA Programming and Performance	0	1340	August 12, 2008
CUDA GL Interop CUDA Programming and Performance	0	3186	December 16, 2010
OpenGL in 3.0 CUDA Programming and Performance	3	5261	March 26, 2010
Rendering directly from 2D CUDA texture? CUDA Programming and Performance	10	10117	November 22, 2008

OpenGL performance issue. glReadPixels and cudaGLMapBufferObject bad performance.

Related topics