Cuda vs. OpenGL for simple imaging shaders

Lets say I want write a simple image filtering algorithm, for sake of argument, say a gaussian blur.

I can do this very easily and painlessly by writing an OpenGL shader. I know that opengl will be
loading my gpu very efficiently, coalescing memory, etc… without worrying about it.

I can do the same thing in CUDA, but it is very hard and I have to be very careful to do everything just right.

My question: If the algorithm you want can be written as a shader (like the blur) is it worth writing the
CUDA version? Will it likely be faster?

I understand that CUDA is much more flexible, that is not the question. At some point you have to go to CUDA, I understand.

But I was just wondering about these tradeoffs and whether simple imagers are just as good in OpenGL or will the CUDA
implementation be much faster?


(and yes, I understand that I can just try it… but maybe someone already has??)

For simple per-pixel operations (e.g. color conversion), there is not much difference.

For filters like Gaussian blur that can take advantage of shared memory in CUDA, we have measured up to a 2x performance improvement over OpenGL.

Thanks for that experience, it was just what I was wondering. I’ll have to think about why OpenGL doesn’t use the shared memory as efficiently?? Probably relying on general texture caching? It must be the generality of per-pixel operations somehow…

But thanks! Does the difference go away as the computation per pixel access goes up?

OpenGL doesn’t have shared memory (no concept of a thread block).

Yes, the perf. benefit of smem depends of how memory-bandwidth limited the kernel is (imaging kernels often are), and goes up the more times data in shared memory is re-used.