I’d like to code a kernel that processes video and doesn’t return any data to the CPU, but instead displays the processed result in a window. In other words, a complex CUDA-based video renderer, like VMR9.
It doesn’t need to be a directshow filter nor DMO, although that would be really nice ~ displaying in my own window is enough for now.
The processing kernel in question can be, for example, deinterlacing or advanced resizing.
Any ideas how to do that? How to use CUDA and display the result, without costly (and pointless) trip to main memory?
 Apologies, it seems that everything I need is in section B5 and B6. I have no idea how I missed that. Moreover, there is fluids example ~