CUDA vs pixel shader A list of reasons, one over the other

Hey guys.

Why would one use CUDA for per-pixel processing when you can use a regular Cg/HLSL pixel shader? In other words, what can you do with a GPGPU language that you can’t do with a pixel shader on top of a fullscreen quad?

Might sound obvious, but I was still hoping to brainstorm and get a list of reasons going - on top of the following:

  • GPGPU language gives you more control over GPU/PCB resources
  • Can set number of threads
  • Can set number of blocks
  • Can optimize using different memory spaces available
  • Can customize thread load balancing to one’s choosing
  • etc?

A related thread…but it does not have too much information:

One reason may be, that CUDA is not available on any Intel integrated graphics, but HLSL respectively OpenGL Shading Language is supported. If you can achieve acceleration with a pixel shader even on low spec hardware, it definitely makes sense to have a code path for it. Just in case there is no CUDA available…

With cuda, you can write to arbitrary memory locations. With a shader, you’re limited to writing to the pixel you’re processing. That right there is probably the single biggest advantage of cuda. Yes, there are work-arounds with vertex/geometry shaders, but even in the best case, these will implicitly flush the instruction buffer, texture caches, require communication with the CPU, take a detour through triangle setup, ect. Can we say context switches? Thus, for scatter, it’s clear that cuda is vastly more efficient.

Basically, the trade off is simple:

Cuda can do anything a shader can, and some things a shader can’t, and almost always as fast or even a bit faster.

But cuda only works on Nvidia GPUs.

I would add that shared memory is the other big advantage of CUDA/OpenCL over pixel shaders. This can provide big performance improvements in bandwidth-limited image processing algorithms.

These are perhaps the most important points. There is a lot more transparency with CUDA. You know what magic numbers affect performance and you can tune your code appropriately to make sure you get the most out of the video card.