In GLSL shader programming, it is advisable to process an array of elements by making it a 2D matrix. Does the same hold true for CUDA?
i.e If I take a 1D array of 1000 elements and do some reduction, is it advisable to convert it to 2D to gain performance???
Secondly, CUDA RAM is divided into various groups such as Global , Shareed ,Texture and Constant ( if Registers are excluded). If I treat 1D array as texture and then perform reduction , will it be faster than if I process it in Global Mem and / or shared ,mem.
In short is this sequence correct in terms of speed (Descending order)
Shared Mem > Texture Mem > Global Mem [Excluding registers]…
I am curious about texture memory as in GLSL programming, we treat 2D texture as a memory data structure for GPU and for image processing, render to 2D textures.
Secondly, What is the equivalent of FBO in GLSL in CUDA?
Thanks.