I have the following code:
!$acc kernels present(vp) vp%r=0. vp%t=0. vp%p=0. !$acc end kernels
where I am using deepcopy for “vp”.
The code works, but when I look at the compiler output, I see:
28047, Loop is parallelizable Accelerator scalar kernel generated Accelerator kernel generated Generating Tesla code 28047, !$acc loop gang, vector(4) ! blockidx%z threadidx%y !$acc loop gang, vector(32) ! blockidx%x threadidx%x !$acc loop gang ! blockidx%y
(Line 28047 is one of the three assignments in the above kernels region.)
To me, the chosen arrangement of CUDA blocks and threads seems strange. vp%r is a 3D array which looks like is being mapped to [BLK Y][BLK X THREAD X][BLK Z, THREAD Y].
I have never seen a dimension being assigned to a block-thread combo with different directions before.
What is actually happening here?
What does the “Accelerator scalar kernel generated” mean as compared to the “Accelerator kernel generated”?
Does this mean a non-parallel kernel is being created as well?