efficient indexing for arrays

Dude · October 10, 2008, 11:25pm

Hi all

In my program, the arrays I use can vary based on what the user specifies. they store enormous amounts of information from 500 x 500 arrays to 1000 x 1000 and possibly even 10000 x 10000. I was wondering what would be the most efficient way to create threads to compute this data. for the case of the 500 x 500 arrays I specify (20, 20) blocksize and (25,25) gridsize. now to specify threads for 1000 x 1000 should i run a for loop looping over the data twice but the second time around multiply the index numbers by 2 shown below? or is there a more efficient or just a better way to approach this?

example

int j = blockIdx.xblockDim.x+threadIdx.x;
int i = blockIdx.yblockDim.y+threadIdx.y;

for(int n=1;n<=2;n++)
{
j=jn;
i=in;

array[i*width+j] = array[i*width+j]+data*moredata;

}

maxpower3141 · October 10, 2008, 11:56pm

Rule of thumb for coalescing reads (you want this, especially if you are not using textures to store the data => use textures to store the data - this way you get the benefit of the cached read, which might be quite a big deal) is that if your thread ‘i’ reads or writes array[n], then your thread ‘i+1’ should read/write array[n + 1], where the type of the array entry is 32bits wide (using 64bit type causes a small performance cost (they say) and using 128bit causes a big one (I wonder if this is due to bank conflicts)).
Another pointer would be to use block-sizes that are a multiple of 16 threads (preferrably power-of-two threads).

Ultimately if speed is of utmost importance to you, then you should benchmark all the different options and choose a strategy based on the results - typically the throughput is not trivial to forecast and therefore you should measure the options, keeping the number of options down to a minimum by ruling out the trivially bad choices and concentrating on best practices.

Topic		Replies	Views
Fast reading of some array CUDA Programming and Performance	3	1543	December 17, 2009
Reducing Multiple Arrays CUDA Programming and Performance	0	873	August 10, 2009
Quick Thread Question Regarding Calling a kernel CUDA Programming and Performance	13	3635	June 26, 2008
blocks and threads question CUDA Programming and Performance	1	1596	February 1, 2008
Coalescing CUDA Programming and Performance	6	3588	May 12, 2008
Grid size and block size Decision CUDA Programming and Performance	4	2408	June 8, 2008
A simple problem CUDA Programming and Performance	10	5231	October 11, 2007
Newbie help on thread blocks CUDA Programming and Performance	22	10685	December 24, 2008
Execution configuration question CUDA Programming and Performance	3	1822	May 12, 2008
Coalesced access with blocks width shorter than 16 CUDA Programming and Performance	5	2686	March 4, 2008

efficient indexing for arrays

Related topics