Mapping between CUDA cores and threads

Dear all,

I have a doubt about CUDA cores and threads. Imaging I have a GPU with 448 cores, each thread will run on one core? It means I can run almost 448 threads each time in parallel, regardless my threads/block configuration?

Thansk a lot.

1 Like

Incidentally, yes. If your GPU has 448 cores, it’s not a compute capability 2.1 device where multiple cores would work on one thread. On all other GPUs, threads will always be scheduled to the same core.

However, this is not something you should care about at all. On a fully loaded GPU, there are many more threads in flight than there are cores (at least 24× more threads than cores). Whether a particular thread always ends up on the same core or on different ones, or on more than one core at the same time, is an implementation detail you should not worry about.

1 Like

Hi, thanks for the reply.

Another litte question: how can I detect the maximum number of threads I can map on the device?

1 Like

cudaGetDeviceProperties() will give you a struct with some useful fields:

multiProcessorCount
maxThreadsPerBlock
maxThreadsPerMultiProcessor

2 Likes

Hi seibert, thanks for the infmo, My GPU has multiProcessorCount=14 maxThreadsPerMultiProcessor=1536. It means the max number of threads runs on the GPU is 14 x 1536 = 21504? So if I exceed this value the kernel will raise an exception?

1 Like

If you exceed that value remaining blocks are started when earlier blocks finish. You do not need to worry about this unless you play dirty tricks to implement inter-block communication.

Ideally, you start a lot more threads than 21504, so that different GPUs all get maxed out (and not too much computing power is wasted while the GPU is loaded only partially in the end).

2 Likes

But is there a theoretical limit regards max thread number for a kernel? Or I can launch millions and millions of cores per time ideally?

1 Like

The limits on the number of blocks per kernel are given in Appendix F.

1 Like