Yeah, I suspect it is a two-fold win: No need for a third warp scheduler and also no additional pressure to increase the maximum number of active warps along with the size of the register file. A SM with 3 schedulers and 3 sets of 16 CUDA cores would probably need even more active warps to keep all the pipelines full. By spending a some die area to make the two schedulers superscalar, you save elsewhere. (And you have to trust that your compiler writers can generate code that exposes as much ILP as possible.)
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| GTX460 number of multiprocessors | 16 | 10323 | September 22, 2010 | |
| GeForce GTX 460 & CUDA 3.1 (What is deviceQuery reporting?) | 8 | 11005 | August 15, 2010 | |
| GTX 460 | 58 | 60568 | August 5, 2010 | |
| Cuda cores of GTX460 | 3 | 18277 | January 27, 2011 | |
| GF100 vs GF104 Performance question | 18 | 9177 | September 4, 2010 | |
| Cores in Tesla c2050 card shows 112 cores instead of 448 | 6 | 11346 | September 4, 2010 | |
| GTX 460: number of cores per multiprocessor? | 6 | 10803 | July 12, 2010 | |
| How to compute performance in GFLOPS ? | 25 | 12397 | November 17, 2008 | |
| gtx 470 showing 112 cores | 8 | 7619 | June 29, 2010 | |
| Nvidia GF104 vs GF100 | 24 | 23229 | October 12, 2010 |