I am reading this. I do not understand why we need different memory fence? Here, we have a barrier::wait and then barrier:force sync. Why not just have one force sync? I mean, the warp finished the work will automatically continue, right?
A typical use case is in a warp-specialized producer-consumer methodology. In this case we need several pieces of functionality:
- producer warps need to be able to signal to consumer warps when data is ready to be consumed (controlling consumer warp execution behavior)
- the data produced by producer warps needs to be made explicitly visible to consumer warps (memory barrier functionality)
- we might desire that although producer warps need to be able to signal to consumer warps as per item 1, we might prefer that they not be forced to wait at a barrier. We’d like them to be able to immediately go on and begin to collect the next tranche of data. This might well be the case when we actually use two separate barriers, and ping-pong buffers, so that one buffer can be filled while another is being consumed. In that case we may need execution barrier functionality here as well, so that a consumer warp that has finished filling buffer 2 does not go on to fill buffer 1, until the consumer warps have signaled that they are finished consuming buffer 1.
- with respect to consumer warps, we definitely need the functionality that forces them to wait at a barrier until the signal is received that the data is ready. (execution barrier functionality)
- with respect to consumer warps, we may also need the functionality that allows them to signal to producer warps that they have finished consuming a buffer, as indicated in item 3.
You cannot achieve all of the above with a single mechanism. memory visibility and execution behavior are two separate aspects of CUDA.
Yeah, if you say so, I definitely understand. Producer-consumer pattern is definitely useful. Thanks!!
By the way, is there other examples? I guess… this cuda::barrier is a quite new feature, right? Last year I did not noticed it. Seems this is related to fine-grained sync. Emmm, sounds useful!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.