I have a texture in OpenGL (640x480xGL_RGBA32F_ARB, so not bytes but floats per channel) and I need to get this data over to CUDA.
Using glGetTexImage for OpenGL->CPU and then using cudaMemcpy for CPU->CUDA it takes 6ms, I guess 3ms per operation but I didn’t check it.
The strange thing is: Using CUDA’s ability to map buffer objects it takes 9ms. I was hoping I could do it a little faster.
Does anyone have timings of their own? I have to say I expected more speed, at least as fast as the first naive version.
Can anyone confirm my performance measurements?
The first method (6ms) is straight forward:
glBindTexture(GL_TEXTURE_2D,tex); glGetTexImage( GL_TEXTURE_2D,0,GL_RGB, GL_FLOAT, pixels); CUDA_SAFE_CALL( cudaMemcpy( d_input,pixels , sizeof(float3) * 640*480 , cudaMemcpyHostToDevice ) );
I implemented the second method (9ms) this way (shortened):
glBindBuffer(GL_ARRAY_BUFFER,buffer); glBufferData(GL_ARRAY_BUFFER,s,fakeNonNullData,GL_DYNAMIC_DRAW); CUDA_SAFE_CALL( cudaGLRegisterBufferObject(buffer) ); glBindBuffer(GL_PIXEL_PACK_BUFFER,buffer); glBindTexture(GL_TEXTURE_2D,tex); glGetTexImage(GL_TEXTURE_2D,0,GL_RGB,GL_FLOAT,0); CUDA_SAFE_CALL( cudaGLMapBufferObject( (void**)&d_input , buffer) );
Of course I create the buffer only once for the whole application lifecycle and I also only once register the buffer object (cudaGLRegisterBufferObject). What I call repeatedly is just glGetTexImage and the cudaGLMapBufferObject.
glGetTexImage takes 6.5ms and cudaGLMapBufferObject takes 2.5ms.
Did you spot any obvious mistakes?
I read the other threads about buffer objects and I also checked the sdk examples.
edit: Ah sorry, I am using CUDA 1.0.