My algorithm requires each thread to store a “dim” length double for each thread. Lets call this variable “state”. Typically the value of dim will vary from 6 to 12, but can be as high as 128. In most cases the value of dim will be 6.
The value of “DIM” will be the same for each kernel call. The final code will have multiple kernel calls with varying DIM values.
The shared memory “state” variable will be reused many times per thread… hence I planned to put it in shared memory.
Currently am planning to define the shared variable for each block as follows:
__shared__ double state[blockSize][dim]; // and access it as state[tid] = some computation ; state[tid] = some computation by same thread ; . . .
blockSize is either 64 for now.
And each thread would access the whole second “dim” dimension of the variable to do various calculations on per-thread basis.
As I am new to FERMI (using a M2090) am not really sure if this is the best way to layout such a variable in shared memory or is there a better way to define/read the variable in shared memory ( 1d layout ? ) ? Will the above mentioned mode of access lead to bank conflicts ? ( I suspect it will ). … is there anyway to avoid them or minimize their impact ?
The programming guide (4.2) doesn’t give much information on double access on Fermi for 2d data layout . The 2d layout is just for convince… but am not sure how to make 1d layout that will not lead to bank conflicts, given the value of dim being variable.
Thanks for all the help.