How can I output a buffer from a kernel as a pixel buffer to a given monitor connected to that GPU. I would like to output it directly, without bringing it back to the CPU.
Use the CUDA-graphics interop sample codes as an example. If you use OpenGL, look for the sample codes that use a OpenGL PBO as the interop vehicle.
Here’s a condensed example:
[url]Sorting Pixels from opengl using CUDA and Thrust - Stack Overflow
The use of PIXEL_UNPACK_BUFFER_ARB there indicates the use of a OpenGL PBO.
I am not using OpenGL, is there a pure CUDA solution?
No. You would have to use a CUDA-graphics interop method. That would be CUDA+OpenGL, CUDA+DirectX, or very recently, CUDA+Vulkan.