GPU architecture and warp scheduling

I said “most”

Volta (sm_70) went back to single-issue, I believe (and also doubled the number of warp schedulers per SM, compared to a sm_60 SM). NVIDIA talks about the reasons for this in such presentations as GTC 2017 Inside Volta (you may have to listen to the recording).

Fermi 2.0 was not dual issue either, although 2.1 was dual-issue capable. Kepler, Maxwell, and Pascal should all be dual-issue capable, I believe.

THe kepler description in the programming guide certainly indicates this:
[url]Programming Guide :: CUDA Toolkit Documentation

Here’s a comment from Greg at NV indicating Kepler and Maxwell are dual-issue capable:

[url]Understanding CUDA scheduling - CUDA Programming and Performance - NVIDIA Developer Forums

I admit that seems to contradict the wording in the cc 5.0 programming guide description.

I don’t have a crisp explanation for every reference you have found, but thanks for pointing those out.

I certainly would like to retract my statement about warp assignment. I agree that in Volta the indications are that it is static, with no migration. At some point I think this must have changed, but I’m not really sure. Maybe it has been static assignment all the way back to Fermi.