When Are cp.async.bulk.commit_group and wait_group Necessary?

I am studying cp.async.bulk(TMA) and the bulk mechanism based on the following code:

In this example, the TMA-store does not use cp.async.bulk.commit_group or wait_group but still produces correct results. My questions are:

  1. Is the kernel implicitly applying commit_group and wait_group because the kernel execution ends?
  2. In what scenarios are these two instructions mandatory? Must they be included if there are subsequent operations after the bulk async operations?