I would like to know the size of each bank and the number of banks which are there.
In particular, I want to store nThreads*sizeof(float4) in shared memory.
Each thread would access 1 float4 value, like thread 0 will access 0th float4 and 1st thread will access 1st floaf4 and so on.
I read through the Programming guide but could not figure how to reduce bank conflicts. Is padding a way to do it?
And do bank conflicts occur when threads in a half warp access same bank or all 32 threads in a warp access the same thread?