device->host->device copy vs cudaGLMapBufferObject 6vs9ms, shouldn't mapping be way faster

LastBoyScout · July 12, 2007, 2:47pm

Hi,

I have a texture in OpenGL (640x480xGL_RGBA32F_ARB, so not bytes but floats per channel) and I need to get this data over to CUDA.

Using glGetTexImage for OpenGL->CPU and then using cudaMemcpy for CPU->CUDA it takes 6ms, I guess 3ms per operation but I didn’t check it.

The strange thing is: Using CUDA’s ability to map buffer objects it takes 9ms. I was hoping I could do it a little faster.

Does anyone have timings of their own? I have to say I expected more speed, at least as fast as the first naive version.

Can anyone confirm my performance measurements?

The first method (6ms) is straight forward:

glBindTexture(GL_TEXTURE_2D,tex);

glGetTexImage( GL_TEXTURE_2D,0,GL_RGB, GL_FLOAT, pixels);

CUDA_SAFE_CALL( cudaMemcpy( d_input,pixels , sizeof(float3) * 640*480 , cudaMemcpyHostToDevice ) );

I implemented the second method (9ms) this way (shortened):

glBindBuffer(GL_ARRAY_BUFFER,buffer);

glBufferData(GL_ARRAY_BUFFER,s,fakeNonNullData,GL_DYNAMIC_DRAW);

CUDA_SAFE_CALL( cudaGLRegisterBufferObject(buffer) );

glBindBuffer(GL_PIXEL_PACK_BUFFER,buffer);

glBindTexture(GL_TEXTURE_2D,tex);

glGetTexImage(GL_TEXTURE_2D,0,GL_RGB,GL_FLOAT,0);

CUDA_SAFE_CALL( cudaGLMapBufferObject( (void**)&d_input , buffer) );

Of course I create the buffer only once for the whole application lifecycle and I also only once register the buffer object (cudaGLRegisterBufferObject). What I call repeatedly is just glGetTexImage and the cudaGLMapBufferObject.

glGetTexImage takes 6.5ms and cudaGLMapBufferObject takes 2.5ms.

Did you spot any obvious mistakes?

I read the other threads about buffer objects and I also checked the sdk examples.

thx

LastBoyScout

edit: Ah sorry, I am using CUDA 1.0.

Topic		Replies	Views
OpenGL performance issue. glReadPixels and cudaGLMapBufferObject bad performance. CUDA Programming and Performance	2	6237	March 24, 2010
doubts about transferring/mapping framebuffer textures to cuda space CUDA Programming and Performance	3	2799	March 23, 2010
CUDA / OpenGL Interoperability : Questions about speed CUDA Programming and Performance	0	855	April 18, 2013
Howto efficiently copy texdata from OpenGL to CUDA CUDA Programming and Performance	4	2828	March 5, 2008
Rendering directly from 2D CUDA texture? CUDA Programming and Performance	10	9970	November 22, 2008
Pass openGL data to CUDA. Question about speed. CUDA Programming and Performance	4	1876	August 22, 2016
OpenGL & CUDA interop with surfaces slow... CUDA Programming and Performance	2	956	July 6, 2018
cudaGLMapBufferObject (and unmap) performance These calls take way too long CUDA Programming and Performance	47	76293	February 14, 2010
cudaMemCopy vs glReadPixel Time Performance CUDA Programming and Performance	2	5390	July 13, 2009
A problem of CUDA & OpenGL interoperation CUDA Programming and Performance	4	3951	May 17, 2009

device->host->device copy vs cudaGLMapBufferObject 6vs9ms, shouldn't mapping be way faster

Related topics