What is "cores per SM" ?

The hardware property includes something called “cores per multiprocessor”. This is typically 8, 32, 48, or 192. But I cannot find a definition of ‘core’ in any of the documentation or the two books on CUDA programming that I have. I’m just curious what this is. Thanks!

A CUDA core is a arithmetic pipeline capable of performing one single precision floating point operation per cycle. CUDA core count and frequency can be used to compare the theoretical single precision performance of two different NVIDIA GPUs.

As a CUDA programmer you should completely avoid the notion of CUDA coers as they are not relevant to the design, implementation, or performance of a kernel.

A NVIDIA GPUs contains 1-N Streaming Multiprocessors (SM). Each SM has 1-4 warp schedulers. Each warp scheduler has a register file and multiple execution units. The execution units may be exclusive to the warp scheduler or shared between schedulers. Execution units include CUDA cores (FP/INT), special function units, texture, and load store units. The Fermi and Kepler white papers provide additional information.

2 Likes

Greg - Thank you. That was very helpful. I had guessed that it was something like that, but it’s nice to have confirmation. I’ll hunt around here for the white papers.

Tim

Thanks for stating this concept very clearly. I think these wisdom bits should be included in a FAQ :) Thanks a lot.

How to check the number of SMs on GPU

Call cudaGetDeviceProperties() for the device you want to query, then look at the multiProcessorCount component of the cudaDeviceProp variable filled in by the function.