Large 2D array memory optimization

Hi everyone,
I am working on a CUDA project for my university major, and I’m quite new to the whole scene, so please bear with me!
This forum’s been very helpful in determining a direction for my solution to the problem, but I still had a few questions regarding handling large arrays.
My problem entails parallelisation of a calculation, where each loop only depends upon a few rows of an 100000*100 array.
Furthermore, I have structured the array such that the memory calls are localized. By this I mean
Loop 1 reads rows 1:50 (for example)
Loop 2 reads rows 1:53
Loop 3 reads rows 3:55
etc etc. Furthermore, each subsequent loop’s starting and ending row is greater than or equal to the previous loop.

My initial plan was to keep this array in Texture memory. Am I on the right track?
Thanks in advance

Yes.
If you are using a compute capability 2.x GPU, you could also just use normal array accesses, as the global memory cache on Fermi is larger than the texture cache.

Even with caching, try to keep memory accesses as local as possible, as the ratio between cache size and maximum threads per SM is only between 8…32 bytes/thread.