thread-local arrays in shared mem I want to see if I understood...

Hi all. Despite the compulsive reading of the manual, I’m not sure I’m understanding
the subtleties of memory access…
I would like to know if the following scenario makes sense.

Assume that every thread of a block needs a working array of 100 integers.
(in practice I do not really “share” anything among threads…)

Since 100432 = 16384/2 < 12800 < 16384 (available shared memory),
I decide to use 32 threads per block.

If I understood the manual, the 32-bit words are assigned cyclically to
banks, so, if I have

shared int Data[100*32];

Data[0], Data[16], Data[32],… belong to Bank 0,
Data[1], Data[17], Data[33], belong to Bank 1

Data[15], Data[31], and so on belong to Bank 15.

Then, to allow each thread to work freely on its array
without conflicts with other threads, I should have each thread of each half-warp
to work on a different Bank (?).

Would something like this work:

#define V(x) Data[threadIdx.x + x*32]

to use V(0)… V(99) in each thread as a local array ?
In this way, if I’m right, thread 0 will work only on bank 0,
thread 1 will work only on bank 1 and so on, and the
same for the second half-warp, i.e., thread 16 will work on bank 0
and so on. Is it correct?


Based on my understanding of how banks work, I believe you’re right.