I am teaching a course about CUDA and I have sent the following message to the “Teaching and Curriculum Support” forum here on devtalk. Unfortunately I haven’t received any answer yet, as that forum seems to be less active than others. Please forgive me for cross-posting, but I would really appreciate an explanation to include it in my slides.
I am updating my slides regarding the NVidia architectures and I am adding information about Pascal. I noticed that in Pascal every SM is divided into two processing blocks. Each processing block has 32 SPs, one warp scheduler and two dispatch units.
Now, I understand the two dispatch units per warp scheduler in Kepler and Maxwell, but not in Pascal. Each processing block has 32 SPs which is the size of the warp. Why do we need two dispatch units? Where is the second instruction executed?
Couldn’t find anything about it in the Pascal white paper or on the Internet. Everyone just mentions that there are two dispatch units, but not why.
Hope someone can shed some light into this.