shared memory accesses for different compute capabilities

Hi, everyone!

I’m not sure my understanding on shared memory accesses is correct or not.

For devices of compute capability 1.x (16 banks), only half-warp (16 threads) can access shared memory at a time.
For devices of compute capability 2.x (32 banks), one warp (32 threads) can access shared memory at a time. But for 64-bit accesses, only half-warp (16 threads) can access shared memory at a time, so there’s no bank conflict.

Please, figure out that whether my understanding is correct. Thanks!

Yes you are correct. The appendix of the programming guide explains things quite clearly.

Thanks, hyqneuron! I’ll take a look!