My OpenGL GPGPU app uses a geometry shader with transform feedback (rasterization turned off, rendering from one VBO into another). Therefore each pass I can reduce/expand the data in the array. Plus, I also make use of the ability to read data from adjacent vertices.
I’m very new to CUDA. I want to like it (driver-wise with multiple GPUs in Windows it looks very attractive). The programming model seems a little rigid though - ie. you set up your kernel function, and then you have to specify the size of the thread block (?). Is there a way of using CUDA more like rendering from one array to another, with the ability to reduce/expand the data ?
If not, maybe it’s something that could be incorporated in the future : a shader-like functionality, where you specify an input buffer, an output buffer, a ‘shader’ function and two structs to say how the input/output arrays are arranged.