Fermi architecture

Hello.

  1. Does anyone know, where can I find more detailed information about Fermi architecture, then that is in whitepaper? (what is “Interconnect Network”; processor cycle; memory cycle etc.)
  2. 2.1 capability cards has 48 cores per SM. Why profiller says that warp size is 32?

The warp size is independent of the number of cores per SM. Pre-Fermi devices had only 8 cores per SM, and the warp size was still 32. In compute capability 2.0 cards, there are 32 cores, but (except for double precision instructions), the 32 cores don’t run the same warp. Instead, the two instruction schedulers each issue a different instruction per clock to two groups of 16 cores within the SM. In compute capability 2.1, they added a third group of 16 cores so that one of the instruction schedulers can decide to co-issue two independent instructions from the same warp. As a result, compute capability 2.1 devices can complete up to 3 warp instructions every 2 clocks, given a favorable instruction stream.

The Programming Guide is the most comprehensive source of information. If it’s not in there, Nvidia usually wouldn’t document it at all. And it’s quite comprehensible as well.