GLSL vs CUDA programs

Hi everybody!

I am doing some performance tests executing CUDA programs. But I want to see the real difference with GLSL programs.

Where can I find some examples of GLSL doing matrix multiplication for example? Because I do not see how to do it… :confused:

Thanks.

I assume with matrix multiplication you mean large dense matrix matrix multiply.

the short answer is: No one goes through the painful way of what I like to call legacy GPGPU (programming through graphics APIs) any more, so you probably won’t find things like that.

the long answer is: some stuff is not exposed in DX/GL that is exposed in CUDA/CL (like the shared memory). So your envisioned comparison for non-graphics apps is rather a comparison of the GLSL compiler with manually tuned code. For some special cases, such comparisons make sense nonetheless, I have a few examples in my thesis (not public yet, I’ll defend soon). In short:

  • vector-vector addition: no difference (no surprise)
  • norms/dot products/reductions of any kind: CUDA wins big time (this used to be in the CUDA SDK, reduction example, slides somewhere)
  • same for scan/prefix sum (there’s at least one paper by Harris, Sengupta, Owens et al, I think it’s also in GPU Gems 3, I can dig up the cites if you need them). The difference comes from more flexibility exposed by CUDA.
  • I have a few example kernels that are more efficient in GLSL (geometric multigrid, prolongation step), but that’s probably because I haven’t thought the CUDA variant fully through yet. I plan to since at least 2 years, so much do do, so little time