In my application, some thread blocks are producers, while the others are consumers. In my current implementation, the consumer blocks does busy waiting until the data are ready. How can I put the consumer blocks to sleep so that they can periodically check whether the data are ready?
busy-waiting is cpu-conventional, and in most cases, hardly efficient, not even on the cpu
threads in thread blocks busy waiting is probably not the best of strategies
the cheapest way to busy wait might be to use dynamic parallelism - a relatively small kernel (1 warp) busy waits on a global memory flag, and then issues the grand kernel on triggering of the flag
i believe the best way still is to use events, to know when the data is ready; one can even forward issue work/ consumption with the use of events
you have not provided much background information
Suppose I have two thread blocks. One is the producer and the other is the consumer. When the producer finishes producing the data, it sets ready (a global variable) to 1. Now I let the consumer busy wait (i.e., while(ready!=1); ). I’m wondering whether there is an efficient way to achieve the same goal.
If you don’t want to just hammer on global memory, you can delay a certain period of time by polling using the clock() or clock64() functions. They are described in the programming guide.
As little_jimmy already said, this very much sounds like the wrong concept for a GPU. Have you thought about the necessary measures to avoid deadlocks, e.g. when a consumer runs first and it’s busy-waiting prevents the producer from ever getting to run?
One option would be to dynamically assign to producer/consumer roles. At the start of each block, check the amount of data available. If there is enough data available, assume the consumer role. Otherwise, be a producer, so that following blocks will find more data.
Another option might be to assign producer/consumer roles on a warp basis instead within each block. PTX has dedicated synchronization instructions for this.
Maybe you want to share some more info about the problem you want to solve?
“When the producer finishes producing the data, it sets ready (a global variable) to 1. Now I let the consumer busy wait (i.e., while(ready!=1); )”
this producer/ consumer inter-dependency is something you can implement via events, i would think
tera also made good suggestions
and txbob is right: constantly polling global memory is not really that smart
another alternative might be the use of a global work count(er) as a global atomic, with the following rules:
the producer launches the consumer via dynamic parallelism, whenever it increments the atomic, and the previous count returned is 0 - i.e. when it knows the consumer is not currently running
the consumer is expected to terminate when it decrements the atomic to 0
“Suppose I have two thread blocks.”
unless you follow the suggestions of tera, i do not think it can be 2 thread blocks; rather 2 kernels
thread blocks would be more constrained than kernels - kernels can launch and terminate, and relaunch; thread blocks are far more limited in this regard