Hello,
In the following simple kernel, each thread access a 32bits word in shared memory.
I think this code will cause a bank conflict. All elements in a column, access the same bank.
Row is consecutive in global memory.
I’m running it on TX2. I think bank size is 32bits.
Am I right ?
Thank you,
Zvika
__global__ void my_kernel (int *pSrc, int nx, int ny)
{
unsigned int ix = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int iy = threadIdx.y + blockIdx.y * blockDim.y;
unsigned int idx = iy*nx + ix;
__shared__ int vec[4096];
pSrc[idx] = vec[ix];
}
int dimx=32,dimy=32;
dim3 block(dimx, dimy);
dim3 grid((nx+block.x-1)/block.x, (ny+block.y-1)/block.y);
my_kernel <<<grid, block>>> (pSrc, nx, ny);