efficient indexing for arrays

Hi all

In my program, the arrays I use can vary based on what the user specifies. they store enormous amounts of information from 500 x 500 arrays to 1000 x 1000 and possibly even 10000 x 10000. I was wondering what would be the most efficient way to create threads to compute this data. for the case of the 500 x 500 arrays I specify (20, 20) blocksize and (25,25) gridsize. now to specify threads for 1000 x 1000 should i run a for loop looping over the data twice but the second time around multiply the index numbers by 2 shown below? or is there a more efficient or just a better way to approach this?

example

int j = blockIdx.xblockDim.x+threadIdx.x;
int i = blockIdx.y
blockDim.y+threadIdx.y;

for(int n=1;n<=2;n++)
{
j=jn;
i=i
n;

array[i*width+j] = array[i*width+j]+data*moredata;

}

Rule of thumb for coalescing reads (you want this, especially if you are not using textures to store the data => use textures to store the data - this way you get the benefit of the cached read, which might be quite a big deal) is that if your thread ‘i’ reads or writes array[n], then your thread ‘i+1’ should read/write array[n + 1], where the type of the array entry is 32bits wide (using 64bit type causes a small performance cost (they say) and using 128bit causes a big one (I wonder if this is due to bank conflicts)).
Another pointer would be to use block-sizes that are a multiple of 16 threads (preferrably power-of-two threads).

Ultimately if speed is of utmost importance to you, then you should benchmark all the different options and choose a strategy based on the results - typically the throughput is not trivial to forecast and therefore you should measure the options, keeping the number of options down to a minimum by ruling out the trivially bad choices and concentrating on best practices.