Automatic warp aggregation by PGI Fortran compiler

I have learned from a couple of articles that NVCC compiler is able to perform warp aggregation for atomic operations (e.g., Does PGI Fortran compiler also have similar capabilities?

You should be able to replicate this in CUDA Fortran using Cooperative Groups. See:

I meant to ask if the PGI Fortran compiler can do it for me automatically. There is a note at the very top of the link I posted which says: “The NVCC compiler now performs warp aggregation for atomics automatically in many cases, …” . My questions is if the PGI Fortran compiler is also able to do the same.

I am pretty sure CUDA Fortran does not do this.