Mixing OpenGL and CUDA

I’m using OpenGL for some GPGPU image processing, basicly composition and scaling.

Since this is fairly simple to implement in opengl i choose that over implementing CUDA kernels.

However I have some copying overheads with my opengl solution which I believe CUDA avoids.

From my understanding with opengl it works something like:

Note: “not in OGL context thread” is a limitation in my application where images are created by various active objects (renderers).

[i]allocate system memory (not in OGL context thread)
memcpy from some image source to system memory (not in OGL context thread)
memcpy from system memory to PBO locked system memory // Here I have a large overhead

start async DMA transfer to OGL texture for previous image (double buffered)
draw fullscreen quad

… read back into system memory from framebuffer [/i]

If I understood correctly CUDA avoids the extra memory copy overhead by allowing allocation of locked system memory directly, which can be directly moved to the gpu using an async DMA transfer

So what I’d like to do is something like:

[i]allocate CUDA locked system memory (not in OGL context thread)
mempy from some image source to CUDA locked system memory (not in OGL context thread)
start async DMA transfer to OGL texture
draw fullscreen quad

… read back into system memory from framebuffer [/i]

Is this possible? Is it a good idea? If so I would appreciate if someone could point me in the right direction.

Thank you