GPU to CPU memory tranfers

Does Cuda have some ability to send raw data back to the cpu, and at what speeds?

ive been using directx and just writing shaders to process images and video but when i try to use directx the only way i have found to get any data out of the GPU was to copy the backbuffer before i sent it to the screen, and in that case it is just an image instead of raw processed datapoints.

basically im doing a lot of filtering and postprocessing on medical images and i need to use the gpu’s speed to run them but i would like to be able to save processed datapoints.

also i would like a faster way of getting the data out of the GPU because when i do grab the images, then i hit a max frame rate of about 8fps instead of the 100+fps without. ive looked a little at directshow but it seems like a real pain.

hopefully this hasnt been addressed a million times in posts already because i tried searching this forum but there’s some BS that wont let me.

Yes indeed, this is very efficient and pretty much goes at the full rate of the PCIe bus… roughly 5GB/sec.

The call you need is cudaMemcpy. It’s used extensively with almost all CUDA apps, and you’ll find it in the CUDA docs and probably every CUDA SDK example.

You might also be interested to know that most CUDA devices can transfer data to/from the GPU concurrent with execution on the GPU. With suitable double-buffering, this could let you hide the cost of memory transfers as well.