Hi all,
I have an array A in my shared memory, each thread reads two neighbor elements of them. For example, thread 0 reads A[0], A[1], thread 1 reads A[1], A[2]. Does it have bank conflict?
Thank you.
Casy
Hi all,
I have an array A in my shared memory, each thread reads two neighbor elements of them. For example, thread 0 reads A[0], A[1], thread 1 reads A[1], A[2]. Does it have bank conflict?
Thank you.
Casy
Yes, you have a bank conflict at A[1] …
When I try to access memory like this ( for example in a reduction) I use indexes like the following:
n is the number of elements:
Thread[ i ]: A[ i ] and A[ i + n >> 1 ]
… ( n >> 1 is the same as n / 2 if n is a power of 2 )
You also can use something like
unsigned int i = blockIdx.x * ( blockDim.x * 2 ) + threadIdx.x; // TODO adept
unsigned int ib = i + blockDim.x;
Thread[ i ] accesses A[ i ] and A[ ib ]
This options don’t have bank conflicts but the number of elements should be even.
( Have a look at SDK’s reduction sample! ;) )
This response makes some assumptions which are incorrect.
Let say we have the example of:
__shared__ uint foo[BLOCKSIZE + 1];
uint val1, val2;
val1 = foo[threadIdx.x];
val2 = foo[threadIdx.x + 1];
There is no bank conflict.
Now if you were doing something along the lines of:
__shared__ uint foo[BLOCKSIZE + 1];
uint2 val;
val = (uint2 *) &foo[threadIdx.x];
This would create a bank conflict.
There would be a quote-unquote “bank conflict.” But in fact both scenarios will simply execute in two cycles.