Shared Memory Bank Conflict Clarification

I have read the Programming Guide and Performance Guidelines, but shared memory bank conflict is still not clear to me.
My questions:

  1. At any point of time, on a streaming multiprocessor, aren’t there more than 1 warp per block being executed? If no, that clarifies everything. If yes -
  2. Isn’t it highly likely that one bank is being accessed by more than 1 thread, since number of banks <<< number of possible threads.

1)If there is only 1 warp executed per block at any given time, then isn’t its implication that increasing the number of cores does not increase efficiency of solutions to fine grained parallelism problems?


I think the key is “at a time”. Bank conflicts refers to during a tick.

I think the hardware does processes 16 or 32 threads (depending on GPU etc) in one go.


What is the Compute Capability of your device? On CC2.x, there are 32 banks. So threads from one warp all access through different banks, there will be no bank conflict.
On CC1.x, there are 16 banks. Bank conflict is only counted in half-warps. So still, if threads from a half-warp access through different banks, there will be no bank conflict.

For cards before GT200 , there is only one warp scheduler, so instructions from a single warp could be issued at each clock.
For cards after GT200, the number of warp schedulers is 2, so instructions from 2 warps can be issued at each clock (scheduler clock).