For the first time would like to try to use the linear interpolation feature of 2D texture memory.
This link was useful for the case where there is one 2D texture;
In my use case I will have an input set of 60 matrices (float) of example size (500,700).
I would like to handle 10 such matrices per kernel launch, and would attempt to have 10 distinct textures, in a array form(if possible) like this;
texture<float, cudaTextureType2D, cudaReadModeElementType> tex;
Then bind the current batch of 10 (500,700) matrices to that set of textures before each kernel call, use, unbind and repeat process until done.
Since I only need to do the interpolation across (x,y) I believe I should not use the 3D textures because their interpolation is across(x,y,z).
I already implemented a working application using __ldg() and my own interpolation, but since this is a built in feature of CUDA textures thought that there might be a faster approach.
Did Google this topic before posting, and could not find a specific answer or( even more useful ) a working example. Using textures in such a manner has a more complicated set-up process than handling standard device memory.
How would I go about doing this and would it result in better performance than using __ldg()?