Will this code cause bank conflict ?

Hello,

In the following simple kernel, each thread access a 32bits word in shared memory.
I think this code will cause a bank conflict. All elements in a column, access the same bank.
Row is consecutive in global memory.
I’m running it on TX2. I think bank size is 32bits.

Am I right ?

Thank you,
Zvika

__global__ void my_kernel (int *pSrc, int nx, int ny)
{
	unsigned int ix = threadIdx.x + blockIdx.x * blockDim.x;
	unsigned int iy = threadIdx.y + blockIdx.y * blockDim.y;
	unsigned int idx = iy*nx + ix;
	
	__shared__ int vec[4096];

        pSrc[idx] = vec[ix];
}
int dimx=32,dimy=32;

	dim3 block(dimx, dimy);
	dim3 grid((nx+block.x-1)/block.x, (ny+block.y-1)/block.y);

	my_kernel <<<grid, block>>> (pSrc, nx, ny);

I don’t see any possibility for bank conflicts. However this index:

unsigned int ix = threadIdx.x + blockIdx.x * blockDim.x;

presumably could go from 0 up to some possibly large value. If that large value were higher than 4096, you would have invalid access to shared memory:

__shared__ int vec[4096];

I’m not sure what you mean by “All elements in a column, access the same bank.” Your shared memory storage and indexing does not place all elements in a column, nor does it place all elements in the same bank.