Hi,
I have the following code:
!$acc kernels present(vp)
vp%r=0.
vp%t=0.
vp%p=0.
!$acc end kernels
where I am using deepcopy for “vp”.
The code works, but when I look at the compiler output, I see:
28047, Loop is parallelizable
Accelerator scalar kernel generated
Accelerator kernel generated
Generating Tesla code
28047, !$acc loop gang, vector(4) ! blockidx%z threadidx%y
!$acc loop gang, vector(32) ! blockidx%x threadidx%x
!$acc loop gang ! blockidx%y
(Line 28047 is one of the three assignments in the above kernels region.)
To me, the chosen arrangement of CUDA blocks and threads seems strange. vp%r is a 3D array which looks like is being mapped to [BLK Y][BLK X THREAD X][BLK Z, THREAD Y].
I have never seen a dimension being assigned to a block-thread combo with different directions before.
What is actually happening here?
What does the “Accelerator scalar kernel generated” mean as compared to the “Accelerator kernel generated”?
Does this mean a non-parallel kernel is being created as well?
Thanks!