I am studying cp.async.bulk(TMA)
and the bulk mechanism based on the following code:
In this example, the TMA-store does not use cp.async.bulk.commit_group
or wait_group
but still produces correct results. My questions are:
- Is the kernel implicitly applying
commit_group
andwait_group
because the kernel execution ends? - In what scenarios are these two instructions mandatory? Must they be included if there are subsequent operations after the bulk async operations?