what is difference between the number of kernels directives.

I don’t understand what’s difference between case-1 and case-2.

case - 1

!$acc kernels
  do loop block(1)
!$acc end kernels
!$acc kernels
  do loop block(2)
!$acc end kernels
 !$acc kernels
  do loop block(3)
!$acc end kernels
!$acc kernels
  do loop block(4)
!$acc end kernels

case -2

!$acc kernels
  do loop block(1)
  do loop block(2)
  do loop block(3)
  do loop block(4)
!$acc end kernels

In two cases, how does pgi-compiler interpret fortran code?

I think that if there is no variable dependency in each loop, the two cases are exactly same. Right?

If not how does pgi-compiler interpret the code?

Hi P. Shim,

Assuming no dependencies in your loop blocks, the compiler will create 4 separate kernels in both cases.

The main difference is the scope of the implicit data region. Since each kernels region has an implicit data region, data may be copied for each of the 4 regions in the first case, but only once in the second. Using an explicit data region around the first case would be advisable to help prevent excessive data movement between the host and device.

A second difference is that the CPU will block after each kernel in the first case before launching the next kernel. In the second case, the CPU will launch all 4 kernels on the same stream before blocking at the end of the region. Hence in the second case, the overhead cost of launching 3 of the kernels is hidden. While not a huge difference, the second case will be slightly faster.

Hope this helps,
Mat