I came across this instruction: fence.proxy.async.shared::cta
. Could someone explain the typical use cases for this instruction, particularly in GEMM (General Matrix Multiplication) scenarios?
From PTX, I learned that there are two types of proxies: async and generic (is that correct? Are there others?). Async is easy to understand; it should refer to operations like cp.async
(is that correct?). What about generic? (I know mbarrier.init
belongs to generic, is that correct? Are there other examples?)
IMO, your understanding is correct.
You can read this section again to reinforce your understanding PTX ISA 8.5
2 Likes