There are 8 ALUs in one multiprocessor(MP) but the shared memory is divided into 16 banks. Why ?
My understanding is that at any time there are 8 threads in the MP and therefore one can issue 8 read/write instructions. Where do the other 8 come from ?
[*] NVidia say that G80 executes 8 threads really at the same time (quarter warp)
[*] Memory access (both shared memory banks and global memory coalescing) is based on the concept of a half warp (16 threads)
[*] Thread divergence can happen if flow control takes an alternative path within a warp (32 threads)
It has to do with relative clock speeds, AFAIK. The memory controller runs at 1/2 times the clock speed of the processors, and whatever schedules warps runs at 1/4th.
thanks. That clears up a lot of confusion.