Whether not-in-use producer-consumer take up occupancy?

202476410arsmart · September 27, 2023, 9:20am

I am learning producer-consumer in CUDA, and I noticed this:

Producer	Consumer
wait for buffer to be ready to be filled	signal buffer is ready to be filled
produce data and fill the buffer
signal buffer is filled	wait for buffer to be filled
	consume data in filled buffer

So when consumer has nothing to do, like waiting, and we know the occupancy: the active warp at the same time, is fixed, will this consumer take up one “active warp slot”? Or will it be idle, and let another warp to be active?

By the way, for matmul, we can see cutlass, the latest version uese producer-consumer structure for double buffer loading, why? Previous version does not need this… the loading and calculation will implicitly overlap each other…

Robert_Crovella · September 27, 2023, 3:03pm

In my view a warp slot is that thing that corresponds to the specification item:

Maximum number of resident warps per SM

in this table in the programming guide.

In that sense, each warp in a typical warp-specialized producer consumer arrangement would take up a warp slot - the block it belongs to has been scheduled to an SM, so it takes up a warp slot.

IMO, the question of will the warp be active (i.e. have instructions that can be scheduled by the warp scheduler that it is assigned to) or idle (not have instructions that can be scheduled,) when it is waiting to consume work, can only be answered with a code example.

However if we use the example here, we would say that warps that are waiting at a numbered barrier because they executed bar.cta.sync and are therefore consumer warps, and have not been released because the producer warps have not yet signalled the availability of data to consume, would not have instructions that can be scheduled by the warp scheduler that they are assigned to.

But they are considered for occupancy. They do count as occupying warp slots on the SM or SMSP.

202476410arsmart · September 28, 2023, 12:24pm

In that case, specifically use one warp as producer will decrease the really working warp number! right?

The benefit here is just, we can use TMA block…

Topic		Replies	Views
Resident warp vs active warp CUDA Programming and Performance	5	6323	January 20, 2017
about occupancy CUDA Programming and Performance	3	1645	December 16, 2009
How to put a thread block to sleep for K milliseconds? CUDA Programming and Performance	5	1872	April 20, 2015
Question about warp reuse. CUDA Programming and Performance	4	2003	September 5, 2009
What is cuda::barrier? why we have this? CUDA Programming and Performance	3	2496	October 17, 2023
Increasing number of active warps per scheduler CUDA Programming and Performance	4	2374	January 7, 2022
Sm90: setmaxnreg will change Occupancy dynamically? Nsight Compute	4	265	August 4, 2024
warp occupancy CUDA Programming and Performance	0	798	August 31, 2009
Forcing a CUDA thread block to yield CUDA Programming and Performance	3	2182	January 5, 2012
Producer-Consumer in CUDA CUDA Programming and Performance	2	2719	December 28, 2017

Whether not-in-use producer-consumer take up occupancy?

Related topics