Fatal Error in ACC kernels

Hello,

I’ve having an issue where I’ve wrapped some code with the ACC kernels directive, but upon compiling (seemingly successfully), I get this error:

FATAL ERROR: data in PRESENT clause was not found on device 1: name=ytop2horiz
 file:/home/nearl/ELC/loopELC.f90 diskvisib line:13003

However, when I look at the ACC Minfo output, it shows that ytop2horiz was presumably loaded on the device:

Generating present_or_copyin(ytop2horiz(:ntop2))

Any direction with this issue would be greatly appreciated!

Nick

Hi Nick,

I would double check that the Minfo message is form the same spot as the error at line 13003 of loopELC.f90. It’s possible that there is a present clause there. Note that the data clauses are address associated not name associated, so the variable may not have the same name if being passed in to a subroutine or is a pointer.

Otherwise, I’ll need more details about the code and the directives you’re using. Ideally, a reproducing example would be very helpful.

  • Mat

Unfortunately, I can’t post the code itself. However, Minfo displays the line number 13003 as well:

  13003, Generating present_or_copy(xskydisk(:nradius,:ntheta))
         Generating present_or_copy(yskydisk(:nradius,:ntheta))
         Generating present_or_copy(zskydisk(:nradius,:ntheta))
         Generating present_or_copyin(dtopy(:ndtop))
         Generating present_or_copyin(dtopx(:ndtop))
         Generating present_or_copyin(ytop2horiz(:ntop2))

To give some idea of how the code is setup:

subroutine mainsubroutine(xtop2horiz, ytop2horiz, x2darray, y2darray, x2d, y2d, ntop2)

dimension xtop2horiz(ntop2), ytop2horiz(ntop2), x2darray(x2d, y2d), y2darray(x2d, y2d)

!$acc kernels
do j = 1, N
    do i = 1, N
        if (numb.ge.100) then
            returnedint = someint
            returnedfloat = somefloat
            call somesubroutine(xtop2horiz, ytop2horiz, x2darray, y2darray, returnedint, returnedfloat)
        endif
    enddo
endo
!$acc end kernels

return
end

The compiler inlines all functions, which seems to work great for the other parts of the code that I’ve done this for. It’s only when I add the kernels and end kernels directives around that particular piece of code that the program presents the error (after compiling without any).

I realize this isn’t very useful, but if you have any ideas I’d greatly appreciate it.

Hi Nick,

What happens if you use an explicit data clause?

!$acc kernels copyin(dtopy(:ndtop), dtopx(:ndtop),  ytop2horiz(:ntop2)), &
!$acc         copy(xskydisk(:nradius,:ntheta), yskydisk(:nradius,:ntheta), zskydisk(:nradius,:ntheta)) 
do j = 1, N 
    do i = 1, N
  • Mat

Thanks for your help, Mat. Unfortunately, explicitly using the data clauses seems to make no difference. Fortunately, I believe I found the issue. Ntop2 at one point has a value of zero. I’m guessing that perhaps the allocation on the device for ytop2horiz is disregarded if its length is zero?

I’m guessing that perhaps the allocation on the device for ytop2horiz is disregarded if its length is zero?

The compiler will handle the case where the array is length zero so I’m not clear why you’d get the “present” error. Perhaps something with the inline routine.

  • Mat

I think I may have found my issue. It seems that there is an issue with the accelerator trying to automatically allocate arrays using the limits of the do loop. For instance, consider the mock code below:

      SUBROUTINE TEST(alpha, amax, beta)

      INTEGER i, j, alpha, amax, beta
      DOUBLE PRECISION tarray, betalim
      DIMENSION tarray(beta, alpha), betalim(amax)

c
c$ acc kernels loop
c
      DO i = 1, alpha,
          DO j = 1, betalim(amax)
              IF (i.GT.betalim(j)) EXIT

              tarray(j,i) = 10

          ENDDO
      ENDDO

      END

When I compile the actual code, I get output like:

Generating present_or_copy(tarray(:ibetlim,:alpha))

The compiling finishes successfully, but running the program yields:

FATAL ERROR: data in PRESENT clause was not found on device 1: name=tarray

When I change the upper limit of the inner loop to a defined int value – alpha for instance – it no longer causes the error, and the program runs fine.

Hi nchlsearl,

Can you please post a reproducing example? I tried using your mock code but don’t see the same feedback message. Also what is the “ibetlim” varible?

Thanks,
Mat

% cat test.f
      SUBROUTINE TEST(alpha, amax, beta)

      INTEGER i, j, alpha, amax, beta
      DOUBLE PRECISION tarray, betalim
      DIMENSION tarray(beta, alpha), betalim(amax)

!$acc kernels loop
      DO i = 1, alpha
          DO j = 1, betalim(amax)
              IF (i.GT.betalim(j)) EXIT
              tarray(j,i) =  real(i+j)
          ENDDO
      ENDDO

      print *, tarray
      END

% pgf90 -c -acc -Minfo=accel test.f -V13.7
test:
      7, Generating present_or_copyin(betalim(:))
         Generating present_or_copy(tarray(:,:alpha))
         Generating NVIDIA code
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
      8, Loop is parallelizable
         Accelerator kernel generated
          8, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
      9, Inner sequential loop scheduled on accelerator

Here is working test code. The ibetlim should have just been betlim, sorry about that.

      PROGRAM TEST_CASE

      IMPLICIT NONE

      INTEGER i, j, Alpha, Beta
      INTEGER Firstarray, Secondarray
      INTEGER Betlim

      DIMENSION Firstarray(Beta,Alpha),
     &          Secondarray(Beta,Alpha), Betlim(Alpha)

      Alpha = 100
      Beta = 100
c
c$acc kernels loop
c
      DO i = 1, Alpha
        DO j = 1, Betlim(Alpha)
            IF ( j.GT.Betlim(i) ) EXIT
            Firstarray(j,i) = 10
            Secondarray(j,i) = 20
        ENDDO
      ENDDO

      PRINT *, Firstarray(10,10), Secondarray(10,10)

      END PROGRAM TEST_CASE

This compiles fine using

pgfortran -fast -acc -Minfo=acc test_case.for -o test_case

with output

test_case:
     15, Generating present_or_copy(firstarray(:betlim,:100))
         Generating present_or_copy(secondarray(:betlim,:100))
         Generating present_or_copyin(betlim(:100))
         Generating NVIDIA code
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     17, Loop is parallelizable
         Accelerator kernel generated
         17, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     18, Inner sequential loop scheduled on accelerator

Running the program yields

FATAL ERROR: data in PRESENT clause was not found on device 1: name=firstarray
 file:/home/nearl/ELC/test_case.for test_case line:15

Thanks again for all your help.

Hi nchlsearl,

Looks like the release compiler is not catching a problem with your code. With our development compiler I get the following:

% pgf90 -acc -Minfo test.f90 -Mfixed -Vdev
PGF90-S-0310-Adjustable array can not have automatic bounds specifiers - firstarray (test.f90)
PGF90-S-0310-Adjustable array can not have automatic bounds specifiers - secondarray (test.f90)
PGF90-S-0310-Adjustable array can not have automatic bounds specifiers - betlim (test.f90)
  0 inform,   0 warnings,   3 severes, 0 fatal for test_case

To fix, have Alpha and Beta be parameters so that the arrays get sized correctly.

% cat test2.f90
      PROGRAM TEST_CASE

      IMPLICIT NONE

      INTEGER i, j, Alpha, Beta
      PARAMETER Alpha=100, Beta=100
      INTEGER Firstarray, Secondarray
      INTEGER Betlim

      DIMENSION Firstarray(Beta,Alpha),
     &          Secondarray(Beta,Alpha), Betlim(Alpha)

c      Alpha = 100
c      Beta = 100
c
c$acc kernels loop
c
      DO i = 1, Alpha
        DO j = 1, Betlim(Alpha)
            IF ( j.GT.Betlim(i) ) EXIT
            Firstarray(j,i) = 10
            Secondarray(j,i) = 20
        ENDDO
      ENDDO

      PRINT *, Firstarray(10,10), Secondarray(10,10)

      END PROGRAM TEST_CASE

% pgf90 -acc -Minfo test2.f90 -V13.7 -Mfixed
test_case:
     16, Generating present_or_copy(firstarray(:betlim,:))
         Generating present_or_copy(secondarray(:betlim,:))
         Generating present_or_copyin(betlim(:))
         Generating NVIDIA code
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     18, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     19, Inner sequential loop scheduled on accelerator
% a.out
            0            0

Hope this helps,
Mat

Thanks Mat,

This definitely helped with some of the issues. However, there are instances where Alpha and Beta are passed as arguments to a subroutine that contains nested loops, and I can’t really force them into being parameters without explicitly stating their value (this is undesired because Alpha and Beta limits can be different).

Thanks for all your time and effort,
Nick

Passing in Alpha and Beta to a subroutine and then using them as the size for automatic arrays should be fine. You just can’t do this in the main program since Alpha and Beta’s values are undefined.

Are you able to send the code to PGI Customer Support (trs@pgroup.com)?

  • Mat

Hi Nick,

Dave sent me your code and without even running it I can tell the problem. You’re using “-Mconcur” in your flags. This is auto-parallelization and will create multiple host threads. Since each thread will have a separate device context, data allocated from one thread is not visible from another thread.

Please remove this flag and try again.

Note that there are other issues with your OpenACC that I’m looking at. Hoping to send you updated code soon.

  • Mat