I’m going to buy a GTX 460 in one month so I just started reading on it. And I realize it uses GF104 which differs from the GF100, so here I have some questions. Thanks a lot if you could help me with some of them!
Each SM in GF104 has 2 Warp Scheduler and 4 dispatch units. Since 2 dispatch units would fall under the same warp scheduler, I assume that the two dispatch units give out instructions from the same warp at the one time. I guess the two dispatch units could give non-dependent instructions from the same half-warp to achieve ILP. Could the 2 dispatch units under the same scheduler give out the same instruction for each half-warp, so that instead of the normal 16 threads, 32 threads in the warp would be executed concurrently?
It seems that the warp scheduler and the dispatch units run at half the clock of the SPs, so this means on average two instructions could be given per SP cycle. But each instruction only work on 16 SPs right? Wouldn’t that mean the scheduler could only keep 2/3 of the SPs on a GF104 fully occupied?
What are the other units that run at half the clock of the SPs?
I’m still not so sure with the latency of various actions…
uncached global memory access takes 400 cycles right?
It takes 400 cycles to access texture cache, and coalesced access pattern does not matter at all, correct?
Is L1 cache affected by bank conflicts too?
Shared memory would take 0 cycle to access, as long as no bank conflicts, is this right?
I think I still have many more questions… but maybe I shouldn’t put too many in one topic. Thanks a lot if you could help me answer some of the questions above!