What is the exact meaning of `waves per SM`?

cuda_new_bird · October 8, 2024, 3:16am

hi all, I’m profiling a kernel on H100, and it has 132 SMs.

When I launch the kernel using 2640 cuda blocks, the waves per SM is 10.

132 * 10 = 1320, so I guess, there are at most 1320 blocks can be scheduled at the same time. Is that the fact?

When I change the grid size to 1320, the waves per SM becomes 5. which means at most 640 blocks can be scheduled at the same time?

I’m really confused about that. Waves per SM always is always half of all the blocks. why is that?

wllqwzx · October 8, 2024, 4:10am

A wave in an SM is a group of warps that can run in parallel. The number of waves is calculated as the number of blocks / max blocks per SM / the number of SMs.

In your case, when there are 2640 blocks and 132 SMs, the waves per SM is 10. This means two blocks can run in parallel in one SM.

The number of blocks that can run in parallel in an SM is determined by the available resources in the SM, usually referring to the number of registers, shared memory, and warp slots. This value can be queried with this CUDA API: cudaOccupancyMaxActiveBlocksPerMultiprocessor.

rs277 · October 8, 2024, 4:17am

This post on the “Tail Effect”, may help as well.

cuda_new_bird · October 8, 2024, 4:22am

Thank you! I have some misunderstanding of this.

So one wave on a sm contains 2 cuda blocks, and one SM has 10 waves to execute, is that right?

cuda_new_bird · October 8, 2024, 4:23am

Thank you. I’ll read it. I’m facing some imbalance issue these days.

Could you please read this post? Thank you!

system · October 22, 2024, 4:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about threads per block and warps per SM CUDA Programming and Performance	13	17315	October 6, 2022
CUDA hardware level: Streaming Multiprocessor CUDA Programming and Performance	1	2679	April 27, 2015
How blocks dispatch to SM CUDA Programming and Performance	1	490	December 16, 2021
Scheduling Thread Blocks CUDA Programming and Performance	5	1332	July 29, 2021
Relationship between Warp and Thread Block on SM CUDA Programming and Performance cuda	2	607	November 10, 2023
getting SM information Nsight Compute	1	2129	June 26, 2019
What will be happen in the situation CUDA Programming and Performance	9	6342	December 23, 2008
SP , SM and thread CUDA Programming and Performance	0	1124	February 12, 2011
SM work efficient lower when active-SM more? CUDA Programming and Performance	0	559	July 7, 2013
Relation between SM and block CUDA Programming and Performance	1	5641	March 18, 2010

What is the exact meaning of `waves per SM`?

Related topics