Relationship between Warp and Thread Block on SM

I’m currently studying CUDA Programming and getting confused with the relationship between Warp and Thread Block per SM.

On Compute Capability 9.0,
Maximum number of resident blocks per SM is 32
and Maximum number of threads per block is 1024.

So, I thought Maximum number of resident warps per SM should be 32*(1024/32) (Warp size) = 1024.

But it is 64 actually.

Can you explain why and how such number is calculated?

A maximum number of 32 blocks per SM does not imply that 32 blocks of size 1024 fit on a SM.

The maximum number of threads per SM is 2048, which is equal to 64 warps.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.