Can anybody explain in detail how the CUDA Programming Model can be mapped on the HW on G80.
I mean how Grids, Blocks, Warps , Threads are processed and by wich Hardware Components.
I know that a block is mapped to a multiprocessor but i don’t understand how a warp can run physically in parallel if there are 32 Threads in a Warp and lets and just 8 Streaming Processors in one Multiprocessor.
In my opinion there can only run 8 Threads of a warp physicaly parallel at a time but I think I’m wrong.
So please help me someone