directly render from Device memory

Hi Everyone

Maybe this is a old question, I am wondering how to directly display the processed image from the Device memory to the screen, without transfer back to Host memory

So it was like

{

cudaMemcpy(D, H, size, HosttoDevice); //transfer in

cudaFunction(D); //device process

cudaMemcpy(H, D, size, DevicetoHost); // transfer out

Display(H); //host process

}

Apparently the last two steps wastes some time due to the limited PCIe bandwidth. it would be great if can do the following

{

cudaMemcpy(D, H, size, HosttoDevice); //transfer in

cudaFunction(D); //device process

cudaDisplay(D); //device display

}

I learned something about the OpenGL interop, is this the right way to realize this/

Thank you! :rolleyes:

You can map an OpenGL Pixel Buffer Object (PBO) to CUDA’s address space using cudaGLMapBufferObject and cudaGLUnmapBufferObject, and have a kernel to read/write directly from/to this PBO, which you can later create a texture from it and use it to render (on a quad which fills the screen for example). There are samples which shows how to do it, like the simpleGL in which CUDA modifies data in a VBO.