variable in data clause is partially present on device

when I speed up some fortran code with openacc, I got the following mistake.

deleted block   device:0x7f868fa00000 size:3572224 thread 1 
FATAL ERROR: variable in data clause is partially present on the device: name=turre

this mistake occurs when the program runs. But sometimes it is didnot happen.The define of the variable is as follows:

dimension turre(0:jdim,0:kdim,0:idim,2),blank(jdim,kdim,idim),

!$acc enter data copyin(q,qi0,qj0,qk0,vist3d,vor,tk0)
!$acc enter data copyin(tj0,ti0,vol,si,sj,sk,k0,ux)
!$acc enter data copyin(zksav,bck,bci,bcj,vi0,vj0,vk0,damp1,smin)
!$acc enter data copyin(volj0,volk0,voli0,blank,blend,cmuv)
!$acc enter data copyin(fnu,v3dtmp,r,turre,zksav2,rhside,blend)

besides, I am using PGI 18.4 and the compile command is:
mpif90 -DDIST_MPI -DDBLE_PRECSN -DP3D_SINGLE -DPG -DDBLE_PRECSN -fast -r8 -acc -ta=tesla -Mfixed -Minfo=accel -c twoeqn.F

You probably have to debug this by printing out information to understand what went wrong.

Some things to try:

Print out the size and shape of turre each time the subprogram is entered.

Print out the starting address each time the subprogram is entered. You can do something like
integer(8) addr
addr = transfer(loc(turre(0,0,0,1)),addr)
write(6,‘(z16.16)’) addr

Print out whether various sections of turre are present using acc_is_present() and zero in on what sections are and are not present, which should lead to either a program issue or a compiler issue.

deleted block   device:0x7f3743c00000 size:2857984 thread 1 
deleted block   device:0x7f3746600000 size:3485184 thread 1 
deleted block   device:0x7f3746e00000 size:3572224 thread 1 
deleted block   device:0x7f3742e00000 size:3572224 thread 1 
FATAL ERROR: variable in data clause is partially present on the device: name=cmuv
 file:/home/cfl3d/cfl3dv6/build/cfl/libs/twoeqn.F twoeqn line:1078

Does OpenACC has a limit of the size and the numbers of the variables? I have lots of variables to copyin and I have enough GPU memory.

When I saw the complier information, it showed implicit copyin cmuv. What does "implicit"mean? I didnot put the “cmuv” in data direced statement. However I put the “cmuv” to data statement, it occurs the same mistake.

1076, Generating copyin(smin(:,:,:),vist3d(:,:,:),blend(:,:,:),ux(:,:,:,:),fnu(:,:,:))
         Generating copy(rhside(:,:,:,:))
         Generating copyin(turre(:,:,:,:),damp1(:,:,:),q(:,:,:,:),vor(:,:,:))
   1078, Generating present(q(:,:,:,:),smin(:,:,:),vor(:,:,:),vist3d(:,:,:))
         Generating implicit copyin(cmuv(:jdim-1,:kdim-1,:idim-1))
         Generating present(ux(:,:,:,:),fnu(:,:,:),rhside(:,:,:,:),turre(:,:,:,:),damp1(:,:,:))
         Accelerator kernel generated
         Generating Tesla code

Hi xll_bit,

Does OpenACC has a limit of the size and the numbers of the variables?

No, there’s no limit. Though if any individual allocatable array is greater than 2GB, please add the flag “-Mlarge_array” so the compiler uses 64-bit offsets for array indexing. For static arrays greater than 2GB, please use the flag “-mcmodel=medium”.

What does "implicit"mean?

In the absence of a data clause or if a compute region is within a structured data region within the same scoping unit, the compiler must implicitly copy the data to the device. The default copy uses “present_or” semantics meaning that if the data is present, i.e. in a higher level structured or unstructured data region, the compiler will detect this and use the data already present. Otherwise, it will copy the data.

If the data is in a higher level data region, you can put the variable in a “present” clause so the compiler knows to not implicitly add a copy clause.

FATAL ERROR: variable in data clause is partially present on the device: name=cmuv

There’s probably a mismatch between the size you used for cmuv in the higher level data region and the size the compiler is determining. The compiler attempts to copy the minimum size so will use the loop bounds to determine how much of the array will be used. Again, to not have the compiler add the implicitly copy, put “cmuv” in a present clause on the compute region directive at line 1078 of twoeqn.F.

Note that you might want to double check your array bounds and make sure your not accessing the array out-of-bounds. “Partially present” can occur when the size of the data already present on the device is smaller than what the implicit copy is detecting. It could be the compiler is not getting the size correctly as well, in which case adding the present clause should fix this.

Hope this helps,
Mat

Hi xll_bit,

Does OpenACC has a limit of the size and the numbers of the variables?

No, there’s no limit. Though if any individual allocatable array is greater than 2GB, please add the flag “-Mlarge_array” so the compiler uses 64-bit offsets for array indexing. For static arrays greater than 2GB, please use the flag “-mcmodel=medium”.

What does "implicit"mean?

In the absence of a data clause or if a compute region is within a structured data region within the same scoping unit, the compiler must implicitly copy the data to the device. The default copy uses “present_or” semantics meaning that if the data is present, i.e. in a higher level structured or unstructured data region, the compiler will detect this and use the data already present. Otherwise, it will copy the data.

If the data is in a higher level data region, you can put the variable in a “present” clause so the compiler knows to not implicitly add a copy clause.

FATAL ERROR: variable in data clause is partially present on the device: name=cmuv

There’s probably a mismatch between the size you used for cmuv in the higher level data region and the size the compiler is determining. The compiler attempts to copy the minimum size so will use the loop bounds to determine how much of the array will be used. Again, to not have the compiler add the implicitly copy, put “cmuv” in a present clause on the compute region directive at line 1078 of twoeqn.F.

Note that you might want to double check your array bounds and make sure your not accessing the array out-of-bounds. “Partially present” can occur when the size of the data already present on the device is smaller than what the implicit copy is detecting. It could be the compiler is not getting the size correctly as well, in which case adding the present clause should fix this.

Hope this helps,
Mat

Hi,Mat
Thanks for your reply, the “present” clause did not solved the error. Here is a test case of my project. It occurs the same mistake and puzzled me for many days.

      subroutine inline_kernels(flag,xdim,ydim,zdim,a,b,
     +   aa,bb,c, d, e, f)
      use openacc
        implicit none
c      use data_f
      integer(4) :: y, x,z,xdim,ydim,zdim
      integer flag

      real a(xdim-1,ydim-1,zdim-1),b(xdim-1,ydim-1,zdim-1),
     + c(xdim-1,ydim-1,zdim-1),d(xdim-1,ydim-1,zdim-1),
     + aa(xdim-1,ydim-1,zdim-1),bb(xdim-1,ydim-1,zdim-1),
     + f(xdim-1,ydim-1,zdim-1),e(xdim-1,ydim-1,zdim-1)
!$acc data copyin(a,aa,bb,
!$acc& b,c,d) copyout(e,f) 
!$acc parallel present(a,b,c,d,f,aa,bb,
!$acc& e) 
      if (flag .eq. 1) then
!$acc loop independent 
        do x=1,xdim-1 
         do y=1,ydim-1 
          do z=1,zdim-1
           e(x, y, z)= a(x, y, z)+ b(x, y, z) 
          end do 
         end do 
        end do 
      else if (flag .NE. 1) then
!$acc loop independent 
        do x=1,xdim-1 
         do y=1,ydim-1 
          do z=1,zdim-1
           e(x, y, z)= c(x, y, z)* d(x, y, z) 
           f(x, y, z)= aa(x, y, z)* bb(x, y, z) 
          end do 
         end do 
        end do 
      end if
!$acc end parallel 
!$acc end data 
      end



      program main
        implicit none
      real, allocatable :: c(:,:,:), d(:,:,:),e(:,:,:),f(:,:,:) 
      real, allocatable :: a(:,:,:), b(:,:,:),aa(:,:,:),bb(:,:,:) 
      integer :: x, y, z,ii,xdim,ydim,zdim
      integer :: fail_x, fail_y, fail_z 
      integer test,flag
      real xi,yi,zi
      
      flag = 1
      xdim=256
      ydim=256
      zdim=10
      allocate(c(xdim-1,ydim-1,zdim-1))
      allocate(d(xdim-1,ydim-1,zdim-1))
      allocate(e(xdim-1,ydim-1,zdim-1))
      allocate(f(xdim-1,ydim-1,zdim-1))
      allocate(a(xdim-1,ydim-1,zdim-1))
      allocate(aa(xdim-1,ydim-1,zdim-1))
      allocate(b(xdim-1,ydim-1,zdim-1))
      allocate(bb(xdim-1,ydim-1,zdim-1))

      a(:,:,:) = 2.0d0 
      b(:,:,:) = 3.0d0 
      aa(:,:,:) = 2.0d0 
      bb(:,:,:) = 3.0d0 
      c(:,:,:) = 2.0d0 
      d(:,:,:) = 3.0d0 
      e(:,:,:) = 0.0d0 
      f(:,:,:) = 0.0d0 
      test = 1 

      do ii=1,100
        call random_number(xi)
        call random_number(yi)
        call random_number(zi)
        xdim = floor(xi*100)+50
        ydim = floor(yi*100)+50
        zdim = floor(zi*100)+50
        call inline_kernels(flag,xdim,ydim,zdim,a,b,aa,bb,c, d, e, f) 
        write(6,*) "ii=",ii
      enddo

      write(6,*) "calculation complete" 

      do y=1,xdim-1 
       do x=1,ydim-1
        do z=1,zdim-1
         if (test .EQ. 1 .AND. flag .EQ. 1 
     +    .AND. e(x, y, z) .NE. 5.0d0) then 
          test = 2 
          fail_x = 10 
          fail_y = y 
          fail_z = z 
         end if 
         if (test .EQ. 1 .AND. flag .NE. 1
     +    .AND. f(x, y, z) .NE. 6.0d0) then 
          test = 2 
          fail_x = 20 
          fail_y = y 
          fail_z = z 
         end if 
        end do 
       end do 
      end do 

      if (test .EQ. 1) then 
      write(6,*) "test ok" 
      else 
      write(6,*) "test failed" 
      write(6,*) "fails at", fail_x, fail_y, fail_z, "E:", 
     + e(fail_x, fail_y, fail_z), "F:", f(fail_x, fail_y, fail_z) 
      end if 
      deallocate(a)
      deallocate(b)
      deallocate(aa)
      deallocate(bb)
      deallocate(c)
      deallocate(d)
      deallocate(e)
      deallocate(f)

      stop 
      end program main

And here is my compile command

mpif90 inline_kernels.F data_test_0403.F -acc -Minfo=accel -ta=tesla -g -c
mpif90 -acc *.o -o test

Here is the error report

$ ./test 
 ii=            1
aa lives at 0x7f5e8906ba70 size 7397376 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 6.0, threadid=1
host:0x7f5e892a8860 device:0x7f5e66400000 size:7397376 presentcount:1+0 line:13 name:a
allocated block device:0x7f5e66400000 size:7397376 thread:1
deleted block   device:0x7f5e62c00000 size:2079744 thread 1 
deleted block   device:0x7f5e62e00000 size:2079744 thread 1 
deleted block   device:0x7f5e63000000 size:2079744 thread 1 
deleted block   device:0x7f5e63200000 size:2079744 thread 1 
deleted block   device:0x7f5e63400000 size:2079744 thread 1 
deleted block   device:0x7f5e63600000 size:2079744 thread 1 
deleted block   device:0x7f5e63800000 size:2079744 thread 1 
deleted block   device:0x7f5e63a00000 size:2079744 thread 1 
FATAL ERROR: variable in data clause is partially present on the device: name=aa
 file:/home/data_test/inline_kernels.F inline_kernels line:13

Thanks xll_bit.

One thing to do before compiling with OpenACC, it to ensure that your program runs correctly without the directives enabled.

When I do this, I see that the code errors with a seg fault. The problem being that you allocate the arrays with a zdim of size 10, but pass in zdim values with much larger sizes to the subroutine. Hence the seg fault.

When I set the initial zdim to 256, the code runs correctly with or without the OpenACC directives enabled.

When zdim=10

% pgfortran test.F
% a.out
 ii=            1
Segmentation fault

When zdim=256

% pgfortran test.F -ta=tesla -V18.10
% a.out
 ii=            1
 ii=            2
 ii=            3
 ii=            4
 ii=            5
... cut ...
 ii=           98
 ii=           99
 ii=          100
 calculation complete
 test failed
 fails at           10            1           42 E:    0.000000     F:
    0.000000
Warning: ieee_inexact is signaling
FORTRAN STOP

Hope this helps,
Mat