I need some clarification on the terms Blocks, Threads, Multiprocessors, and Cuda Cores and whats the maximum value for each one. I have an evga GTX 560TI 2GB (Fermi) GPU
From what I gathered: There are 32 cuda cores per multiprocessor(SM)?
each (SM) can execute 46 warps
each warp can execute 32 threads
and the number of threads running in parallel matches the number of CUDA cores (384 Processor Cores in my case)
Are these values correct?
So I have 384/32 = 12 SMs, meaning that I can only have 124632 = 17664 threads active at once (but not technically running in parallel)? But this does not seem correct.
But wait, what about Blocks, where do they fit in the picture? How man blocks can I have? Are they in ratio to the amount of cuda cores that I have? Is there any point to have more blocks then cuda cores in terms of performance?
I have read a bunch of the Nvidia programming guides, and things are very unclear about these matters.
Thank you very much in advance, sorry about all of the questions.
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Each block is submitted to a multiprocessor, but only 32 threads (warp size) are executed at a time. Each block can have maximum 512 threads which cab arranged in a 3D grid, while the blocks can be only in a 2D grid. So in total there executed 32 threads x number_of_multiprocessor at a time, but you can submit kernel with a total of 512x65535x65535 number of threads.
I am not sure about that you will need to try both ways, many blocks or fewer blocks and see which one is better. Maybe you meant threads and yes there is benefit to have more than 384 threads, because the gpu hides the latency which arise from memory reads by pausing threads and starting to execute new threads, it always depends on the problem an you need to test a lot.