The number of GPU cores if often quoted as multiprocessors times cores, but I am confused about it’s relevance. The CUDA documentation seems to talk about multiprocessors, blocks and threads, but never cores. Where do they fit in?
For example, the throughput (section 5.4.1 in the CUDA programming guide) is given per multiprocessor, so having more codes doesn’t seem to speed things up. Or this actually the throughput per per core? Should I start at least as many blocks as there are multiprocessors or as there are cores?