Hi, everyone!
I’m not sure my understanding on shared memory accesses is correct or not.
For devices of compute capability 1.x (16 banks), only half-warp (16 threads) can access shared memory at a time.
For devices of compute capability 2.x (32 banks), one warp (32 threads) can access shared memory at a time. But for 64-bit accesses, only half-warp (16 threads) can access shared memory at a time, so there’s no bank conflict.
Please, figure out that whether my understanding is correct. Thanks!