There are some articles out there saying that accessing byte arrays cause bank conflict. But in CUDA C PROGRAMMING GUIDE (v8.0.61) G3.3. Shared Memory (http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-2-x ), it says:
A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank): In that case, for read accesses, the word is broadcast to the requesting threads (multiple words can be broadcast in a single transaction) and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined).
This means, in particular, that there are no bank conflicts if an array of char is accessed as follows, for example:
extern __shared__ char shared[];
char data = shared[BaseIndex + tid];
This seems to apply to compute capability >= 2.x.
So am I correct in concluding that accessing byte arrays in a shared memory does cause bank conflict in compute capability 1.x (Tesla ), and does not cause for modern GPUs with compute capability >= 2.x (Fermi, Kepler, Maxwell, and Pascal )?