just starting to learn this thing…I’m kinda confused with the warp concept.
So, a kernel contains BLOCKs, and a BLOCK contains THREADs.
Each BLOCK is ran on one Multiprocessor, each THREAD in it is ran by each Streamprocessor. (alright so far?)
And then comes the ALUs that operates at twice the clock of SPs, so we should have multiples of 16 THREADs on each blocks.
So what’s the deal with WARPs? :huh: Why does the hardware separate groups THREADs into WARPs? Guide says each WARP contains the same number of THREADs, but how many exactly? How is it divided?