I have seen many comments about various aspects of GPU architecture that are not exposed to a CUDA programmer, that are exposed to someone using OpenGL. Not being a graphics person (at least not yet), this has got me wondering what exactly is not exposed (that could potentially be useful for scientific algorithms)?
In particular, I am curious about what the texture hardware can do. Currently, it is possible to bind 1d, 2d, or 3d arrays to textures and read them into CUDA cores through the texture cache. As well as obtaining possible speedup with data reuse because of the cache, this also allows one to do linear interpolation on these arrays (as described in the appendix of the programming guide). Can the texturing hardware do anything else? In particular, I am interested in the case where by my textures are composed of vectors (float2, float3, half4, etc.). Does the texture unit have the ability to perform interpolation within the vector, or to rotate some of the array components (swizzling ?) before interpolating? Do the coefficients of the interpolation have to be positive, or are you allowed negative values?
I understand that DirectX 11 requires that textures are also writable to, and no just read only. Presumably this ability will exposed in CUDA 3.x for Fermi GPUs?
On the subject of hardware that isn’t yet exposed to CUDA, what does “rendering to the framebuffer” mean. I understand that this is not exposed in CUDA, but effectively allows one to floating point atomics, and so allows scatter type algorithms (if one writes in OpenGL). Any reason why this ability isn’t exposed to CUDA?
Thanks in advance.