my question is about Blocks and Warps , i managed to understand that within each SM , under G8 for example , we have room for 8 blocks.
when each block is executed it’s basically splitted into warps when each wrap contains 32 Threads , tops.
according to all CUDA documintation , all threads within certain warp preform the same insturction.
my question is , how exactlly that warp is built and how can we be sure that each thread would execute the same instruction before we managed to get into “execution” mode ?
if ( C )
edit : my question is related also to the branch diverengece field.