The “hardware” picture(s) inside the CUDA C Programming Guide are to “virtual” to “vague” to “abstract”, not “deep enough”.
It only shows “cores”…
But what is a “core” apperently a “multi-streaming-processor” is ment with “core” but this is unclear in the picture.
Also a “multi-streaming-processor” in reality seems to have “multiple cuda cores”. This is also not clear from the guide.
Also a “cuda core” can only execute 1 thread. This is also not clear from the guide.
“Warping” is apperently a “cuda core grouping” technology for more efficiency, this is somewhat clear but could be better.
I can understand that the guide tries to remain “general” because the architectures might change in the future and every gpu could be slightly different.
But giving some examples of “compute 1.x” hardware and “compute 2.x” hardware would be much better.
There are probably presentations on the internet which do show “subcores” inside “cores”.
I call it a “sub core” = cuda core.
Perhaps it could also be called a “thread core” which simply means it executes 1 thread at a time.
I call it a “core” = multi streaming processor. (a core can have multiple sub cores).
I think the guide needs to be a bit more clear about how the hardware is actually structured because I have seen many postings on this web forum of people being confused and making wrong conclusions.
So that’s why it’s extra important that the guide be totally clear on how the hardware is structured to take away these confusions and misconceptions