GTX 460 - how man angels on the head of a pin how many cores per MP for a GTX 460 - 32 or 48

Yeah, I suspect it is a two-fold win: No need for a third warp scheduler and also no additional pressure to increase the maximum number of active warps along with the size of the register file. A SM with 3 schedulers and 3 sets of 16 CUDA cores would probably need even more active warps to keep all the pipelines full. By spending a some die area to make the two schedulers superscalar, you save elsewhere. (And you have to trust that your compiler writers can generate code that exposes as much ILP as possible.)