Slow readback from PBuffer to CUDA memory

rohitntu · July 14, 2009, 2:10pm

I’m just wondering if anybody has had any luck transferring data from a 16-bit float PBuffer to CUDA memory via a PBO at fast speeds. If I use a 8-bit PBuffer, and 8-bit PBO data, I get pretty good speeds. I need to use a 16-bit float PBuffer, and 10-bit integer data (10_10_10_2 packing) in CUDA memory.

I’m using the technique as shown in the postProcessGL example program in the SDK but am not getting good speeds if I use anything other than 8-bit packing.

Are there any faster methods to read back the data from a PBuffer (16-bit float).

Thanks

Nico · July 14, 2009, 6:55pm

Still using PBuffers? I’d recommend using FBOs with attached textures or renderbuffers.

N.

jack · July 14, 2009, 7:49pm

Can you profile the kernel (using the profiler supplied in the CUDA SDK)? I haven’t really done too much with OpenGL but my off-the-cuff guess is that reading the 10-bit integers is causing extra memory reads somewhere.

rohitntu · July 15, 2009, 3:03am

Actually I have completely disabled the kernel now. The issue seems to be just the glReadPixels to read back from the 16-bit float PBuffer to 10-bit OpenGL PBO is causing the problem. If I use a 8-bit PBuffer and a 8-bit PBO I get excellent speeds.

Is there a better method to get screen/off-screen rendered data using OpenGL back to CUDA for processing?

rohitntu · July 15, 2009, 3:07am

Hmmm I can switch to using FBOs, but it seems I still need to use glReadPixels to go back to CUDA via a PBO. Or is there a better way?

I’ll try this but I suspect I’ll run into the same when I try to read non 8-bit data.

E.D_Riedijk · July 15, 2009, 4:15am

Check the last post here : http://forums.nvidia.com/index.php?showtop…hl=glReadPixels

glReadPixels is apparently very slow.

Nico · July 15, 2009, 7:35am

That’s a weird post in your link. The poster suggests using glTexSubImage instead of glReadPixels, but glTexSubImage is used to transfer data from a PBO to a texture, not for transferring from a framebuffer/FBO to a PBO.

Maybe he meant to say glGetTexImage, but this seems very unlikely, as you would need a PIXEL_PACK_BUFFER instead of a PIXEL_UNPACK_BUFFER.

I used

[codebox]

glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pbo);

glReadPixels(0,0,w,h,GL_RGBA,GL_HALF_FLOAT_ARB,BUFFER_OFFSET

(0));

glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);

[/codebox]

without problems

N.

rohitntu · July 16, 2009, 11:21am

Ok I made some progress with this.

This is what I am doing now:

Render to a 16-bit half float PBuffer/FBO
glReadPixels as GL_HALF_FLOAT_NV in a RGBA format back to a PBO (this is quite fast since the format of the PBO and FBO match)
Map to CUDA device memory using CUDA 2.2 (this is quite fast as well)

Where I am stuck is that I am unable to read the half float array in CUDA as CUDA doesn’t seem to support 16-bit floats.

Any ideas anyone?

Nico · July 16, 2009, 11:30am

You could wait for CUDA 2.3 to be released. It has new support for fp16 <-> conversion intrinsics which allows storage of data in fp16 format with computation in fp32, or use the Driver API which supports fp16 array formats.

N.

rohitntu · July 17, 2009, 9:55am

Yup I tried the CUDA 2.3 beta, and that does indeed solve my problem. Excellent - thanks everyone for your suggestions and help.

Topic		Replies	Views
CUDA and OpenGL data transfer CUDA Programming and Performance	9	21301	October 6, 2007
Pass openGL data to CUDA. Question about speed. CUDA Programming and Performance	4	1877	August 22, 2016
PBO and CUDA Texture CUDA Programming and Performance	4	11118	January 13, 2008
Does CUDA support 24bit float ? Try to use openGL depthbuffer in CUDA CUDA Programming and Performance	7	2085	June 23, 2011
readPixels performance CUDA Programming and Performance	2	2012	December 1, 2008
CUDA Runs 1/10th the speed of openGL CUDA Programming and Performance	9	7204	September 3, 2008
display a buffer openGL/cuda question CUDA Programming and Performance	11	8188	May 13, 2008
How to read color buffer from within Cuda program CUDA Programming and Performance	10	10672	March 29, 2007
Low performance and high CPU usage CUDA Programming and Performance	13	19460	August 7, 2007
Processing Windows GDI drawing with CUDA Finding the way to do so at high speed CUDA Programming and Performance	3	6075	September 11, 2009

Slow readback from PBuffer to CUDA memory

Related topics