I have a beginner’s question about the general capabilites of cuda.

I was wondering if it’s possible to write a program to render some triangles to two or more sets of buffers and perform arbitrary operations or calculations using those buffers, e.g. subtract the pixel values of one from the other, or adding them all up etc, but all on the device and in parallel where logical.

// Setup

 copy tri1 and tri2 from host to device

// Rendering

 render tri1 to buf1

 render tri2 to buf2

// Analysis

 for pixel1 in buf1 and pixel2 in buf2

    result += pixel1 - pixel2

// Finish

 copy result to host

It would seem to me that it might be possible (or perhaps obvious ad the entire point of cuda?) that both the rendering and the subsequent analysis should be possible to perform on device and parallised.

Is there perhaps some sample code doing just that?


Answering my own question, the sample code “Simple GL” seems to do what I described.