I have a quick question regarding the “Parallel Prefix Scan with CUDA” paper. In it, you state briefly that “when multiple threads in the same warp access the same bank , a bank conflict occurs, unless all threads access the same address within the 32 bit word”.
From the website , "Shared memory banks are organized such that successive 32-bit words are assigned to successive banks and the bandwidth is 32 bits per bank per clock cycle. "
i am a little confused by this statement. Is an address not 32 bit on a 32 bit machine and 64 bits on a 64 bit machine. So, if the kernel had shared memory and each thread in the warp (the warp has 32 threads on my machine) was accessing a different address, it would be accessing an address on its very own bank and therefore there should be no conflict.
Am i missing something here?Unfortunately, i cannot make sense of the cuda docs to figure this one out… Please advise.