Hi everybody!
I am doing some performance tests executing CUDA programs. But I want to see the real difference with GLSL programs.
Where can I find some examples of GLSL doing matrix multiplication for example? Because I do not see how to do it… 
Thanks.
I assume with matrix multiplication you mean large dense matrix matrix multiply.
the short answer is: No one goes through the painful way of what I like to call legacy GPGPU (programming through graphics APIs) any more, so you probably won’t find things like that.
the long answer is: some stuff is not exposed in DX/GL that is exposed in CUDA/CL (like the shared memory). So your envisioned comparison for non-graphics apps is rather a comparison of the GLSL compiler with manually tuned code. For some special cases, such comparisons make sense nonetheless, I have a few examples in my thesis (not public yet, I’ll defend soon). In short:
- vector-vector addition: no difference (no surprise)
- norms/dot products/reductions of any kind: CUDA wins big time (this used to be in the CUDA SDK, reduction example, slides somewhere)
- same for scan/prefix sum (there’s at least one paper by Harris, Sengupta, Owens et al, I think it’s also in GPU Gems 3, I can dig up the cites if you need them). The difference comes from more flexibility exposed by CUDA.
- I have a few example kernels that are more efficient in GLSL (geometric multigrid, prolongation step), but that’s probably because I haven’t thought the CUDA variant fully through yet. I plan to since at least 2 years, so much do do, so little time