How the number of concurrent threads is calculated?

Hi, all
I have read the following information from a PPT,
"Contemporary (Fermi) GPU Architecture

32 CUDA Cores per Streaming Multiprocessor (SM)
32 fp32 ops/clock
16 fp64 ops/clock
32 int32 ops/clock
2 Warp schedulers per SM
1,536 concurrent threads
4 special–‐function units
64KB shared memory + L1 cache
32K 32–‐bit registers

Fermi GPUs have as many as 16 SMs
24,576 concurrent threads "

I can’t figure out how 1,536 is obtained.
Even though I have read the whitepaper: NVIDIA’s Next Generation CUDATM Compute Architecture: Fermi.
Is the number of concurrent threads related to the number of warp schedulers?
In my view, the number is determined by the hardware computer resources.

Thanks in advance.


The number 1536 is for SM. Each SM can run 48 warps of 32 threads (48*32=1536).


In the document “NVIDIA CUDA C Programming Guide”

Appendix G. Compute Capabilities

G.1 Features and Technical Specifications

Warp Size: 32

Maximum number of resident warps per multiprocessor: 48 (2.x)

1536 = 32 * 48;

In the document, it reads that

“The multiprocessor creates, manages, schedules, and executes threads in groups of 32 parallel threads called warps.”

What does “schedule” mean here?

How long does one schedule occur?



“schedules” means that the multiprocessor decides the order of execution of each of the warps.

Question. If the MP executes threads in groups of 32’s… then the time for me to compute 33 threads == 64 threads?