What is the exact meaning of `waves per SM`?

hi all, I’m profiling a kernel on H100, and it has 132 SMs.

When I launch the kernel using 2640 cuda blocks, the waves per SM is 10.

132 * 10 = 1320, so I guess, there are at most 1320 blocks can be scheduled at the same time. Is that the fact?

When I change the grid size to 1320, the waves per SM becomes 5. which means at most 640 blocks can be scheduled at the same time?

I’m really confused about that. Waves per SM always is always half of all the blocks. why is that?

image
image

A wave in an SM is a group of warps that can run in parallel. The number of waves is calculated as the number of blocks / max blocks per SM / the number of SMs.

In your case, when there are 2640 blocks and 132 SMs, the waves per SM is 10. This means two blocks can run in parallel in one SM.

The number of blocks that can run in parallel in an SM is determined by the available resources in the SM, usually referring to the number of registers, shared memory, and warp slots. This value can be queried with this CUDA API: cudaOccupancyMaxActiveBlocksPerMultiprocessor.

2 Likes

This post on the “Tail Effect”, may help as well.

1 Like

Thank you! I have some misunderstanding of this.

So one wave on a sm contains 2 cuda blocks, and one SM has 10 waves to execute, is that right?

Thank you. I’ll read it. I’m facing some imbalance issue these days.

Could you please read this post? Thank you!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.