The warp is split across 2 cycles, 16 threads at a time. The “4 processing blocks with 16 cores each”, is referred to as an SMSP - SM Sub Partition. Although answering a question about instruction latency, Greg’s answer here may clarify things. His " EXAMPLE 1 : 1 Warp per SM Sub-partition shows the ALU active for two consecutive cycles processing all 32 threads.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Newbie confusion: thread, block, multiprocessor and processor | 2 | 1331 | April 13, 2011 | |
| How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated? | 4 | 1213 | September 28, 2023 | |
| What Is The Relation Between Warp And SM Processing Block? | 1 | 1985 | May 25, 2018 | |
| Warp Size Question | 21 | 14263 | June 18, 2010 | |
| question about warp, block and threads | 4 | 2058 | February 3, 2009 | |
| No.of threads per scalar processor | 6 | 6581 | July 10, 2009 | |
| Blocks/Warps/Threads Allocation I have some doubts about the allocation of blocks/warps/thread in CU | 5 | 2656 | November 1, 2012 | |
| Multiprocessors or Cuda Cores | 25 | 20123 | July 5, 2011 | |
| A question about the correspondence between warp and core | 17 | 8030 | February 1, 2019 | |
| Warp scheduling - have I got this right? | 17 | 12407 | February 12, 2013 |