Openacc fortran array negative index question

Hello Mat.

This is my code.
fortran openacc, using A100 gpu, nvhpc/22.11,
compile flags
ACC = -fast -acc -Minfo=accel -ta=tesla:cc80

My code

subroutine subroutine_name()
    USE MODULES
    IMPLICIT NONE
    real*8, allocatable, dimension(:,:,:) :: ValueTMP
    INTEGER*8 :: I,J,K,II,JJ,KK

    allocate(ValueTMP(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))
    ValueTMP= 1.d0

    !$acc enter data copyin (ValueTMP(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))

    !!! something

    !$acc parallel loop collapse(3) present(ValueTMP(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1), T(:,:,:,:))
    do k = 1, n3sub-1
    do j = 1, n2sub-1
    do i = 1, n1sub-1
        ValueTMP(i,j,k)=T(I,J,K,MS)
    end do
    end do
    end do
    !$acc end loop

    call mpi_function1()
    call mpi_function2()

    !!! something

    !$acc exit data delete (ValueTMP)
end subroutine subroutine_name

compile message

628, Generating enter data copyin(valuetmp(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))
631, Generating present(t(:,:,:,:),valuetmp(:,:,:))

error message

FATAL ERROR: variable in data clause is partially present on the device: name=valuetmp
 file:/home/jsera.lee/lica/LICA/apply_openACC/src/mod_passivescalar.f90 rhs_ibm_ps line:628

Debug

valuetmp lives at 0x7ffedec6c000 size 54080000 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.0, threadid=1
host:0x78b130 device:0x7fffb3302200 size:128 presentcount:0+1 line:150 name:descriptor
.........
.........
host:0x7ffedb8d0df0 device:0x7ffed2000000 size:54080000 presentcount:0+1 line:1256 name:rhstmp
host:0x7ffedec6cbe0 device:0x7ffed6000000 size:54080000 presentcount:0+1 line:1256 name:valuetmp
host:0x7ffeeafff9d0 device:0x7ffee6000000 size:50331648 presentcount:0+1 line:125 name:nwall_dvm_sub
.........
.........
allocated block device:0x7fffb33e3c00 size:512 thread:1
deleted block   device:0x7fffb33e4000 size:512 threadid=1 
deleted block   device:0x7fffb33e3e00 size:512 threadid=1 
deleted block   device:0x7fffb33e4400 size:512 threadid=1 
deleted block   device:0x7fffb33e4200 size:512 threadid=1 
deleted block   device:0x7fffb33e7a00 size:512 threadid=1 
FATAL ERROR: variable in data clause is partially present on the device: name=valuetmp
 file:/home/jsera.lee/lica/LICA/apply_openACC/src/mod_passivescalar.f90 rhs_ibm_ps line:628

line:628 means → “!$acc enter data copyin (ValueTMP(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))”

I wonder that is openACC can not use -1 start index?
I need negative index because of some MPI functions. It contains ghost cell communication.
I can’t understand why partially present message

cpu buffer size → allocate(ValueTMP(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))
gpu buffer size → !$acc enter data copyin (ValueTMP(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))
It looks like same array size to me

Is there anything i missing?

Negative indices should be fine.

I can’t understand why partially present message

The OpenACC runtime keeps a “present table” which is basically a hash table mapping a host to device address plus the size of the memory block. A partially present error means two host address are within the same range (i.e. they’re both present) but the size of one extends beyond the other.

For example you could allocate a large memory block and then have other pointers into this block. So long as the sub-block of memory doesn’t extend beyond the end of the large block, then it’s fine. Granted both host pointers will be using the same device memory, but for things like pool allocators it can be useful.

The partial present error would come when the second pointer’s size goes beyond the end of the larger block. In this case part of the second array is already present, but not all.

While there can be multiple causes, given your code, there’s a high probability that you missed deleting a local temp array. Given these arrays are on the stack, if you miss deleting one, then when entering another subroutine where the same stack addresses are used, creating a new device array can cause a partially present error.

The present table dump shows that you already have a “rhstmp” and “valuetmp” array which were created on line 1256. I’m presuming that these are also local arrays coming from a different subroutine. If I’m correct, then these weren’t deleted so when the line 628 “valuetmp” gets created, it’s stack host address is partially present.

Of course this is just an educated guess and without a reproducing example I can’t be sure, but double check that you deleted all local arrays before exiting the subroutine.

-Mat

1 Like

I found something wrong.
I used valuetmp in two sections.
To distinguish, I changed the name of the array.
valuetmp → valuetmp1, valuetmp2
valuetmp1 is used at mod_flowarray.f90 in RHS_IBM subroutine
valuetmp2 is used at mod_passivescalar.f90 in RHS_IBM_PS subroutine

You mentioned it

The present table dump shows that you already have a “rhstmp” and “valuetmp” array which were created on line 1256. I’m presuming that these are also local arrays coming from a different subroutine. If I’m correct, then these weren’t deleted so when the line 628 “valuetmp” gets created, it’s stack host address is partially present.

now line 1256 means valuetmp1.
but valuetmp2 partially present error was still remained.

this is runtime error message (dump)

valuetmp2 lives at 0x7fca6ac6c000 size 54080000 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.0, threadid=1
host:0x899d30 device:0x7fcb41302200 size:128 presentcount:0+1 line:150 name:descriptor
host:0x899dc0 device:0x7fcb41306400 size:128 presentcount:0+1 line:151 name:descriptor
host:0x899e50 device:0x7fcb41308e00 size:128 presentcount:0+1 line:152 name:descriptor

host:0x465ac10 device:0x7fcab5000000 size:2146304 presentcount:0+1 line:217 name:kfc_ps_sub
host:0x486cd30 device:0x7fca97a00000 size:2146304 presentcount:0+1 line:216 name:fcv_ps_sub
host:0x4a7f060 device:0x7fca9ba00000 size:2146304 presentcount:0+1 line:216 name:fcvavg_ps_sub
host:0x4c915a0 device:0x7fca81e00000 size:1810176 presentcount:0+1 line:218 name:mmpi_ps_sub
host:0x4e51e00 device:0x7fca81a00000 size:2146304 presentcount:0+1 line:183 name:dpsdtr_sub
host:0x55d3660 device:0x7fcab5400000 size:2813952 presentcount:0+1 line:124 name:inz_sub
host:0x5889c20 device:0x7fcab5800000 size:2813952 presentcount:0+1 line:124 name:jnz_sub
host:0x5b403f0 device:0x7fcab5c00000 size:2813952 presentcount:0+1 line:124 name:knz_sub
host:0x7fca678d0df0 device:0x7fca5e000000 size:54080000 presentcount:0+1 line:1257 name:rhstmp
host:0x7fca6ac6cbe0 device:0x7fca62000000 size:54080000 presentcount:0+1 line:1256 name:valuetmp1
host:0x7fca77ff79d0 device:0x7fca72000000 size:50331648 presentcount:0+1 line:125 name:nwall_dvm_sub
host:0x7fca82d45c70 device:0x7fca6e000000 size:52186176 presentcount:0+1 line:766 name:qmass_sub
host:0x7fca85f0fa60 device:0x7fca7e000000 size:50813568 presentcount:0+1 line:421 name:gfi_sub
host:0x7fca8b076740 device:0x7fcab9000000 size:16291584 presentcount:0+1 line:219 name:gfi_ps_sub

host:0x7fcb095576d0 device:0x7fcad8000000 size:50331648 presentcount:0+1 line:121 name:rk3xoo
host:0x7fcb0c5594c0 device:0x7fcadc000000 size:50331648 presentcount:0+1 line:120 name:rk3zo
host:0x7fcb0f55b2b0 device:0x7fcae0000000 size:50331648 presentcount:0+1 line:119 name:rk3yo
host:0x7fcb1255d0a0 device:0x7fcae4000000 size:50331648 presentcount:0+1 line:118 name:rk3xo
host:0x7fcb1555ee90 device:0x7fcae8000000 size:156558528 presentcount:0+1 line:117 name:rhs1
host:0x7fcb1eaaec80 device:0x7fcaf2000000 size:52186176 presentcount:0+1 line:116 name:p
host:0x7fcb21c74a70 device:0x7fcaf6000000 size:52186176 presentcount:0+1 line:115 name:w
host:0x7fcb24e3a860 device:0x7fcafa000000 size:52186176 presentcount:0+1 line:114 name:v
host:0x7fcb6050d650 device:0x7fcafe000000 size:52186176 presentcount:0+1 line:113 name:u
allocated block device:0x7fca5e000000 size:54080000 thread:1
allocated block device:0x7fca62000000 size:54080000 thread:1
allocated block device:0x7fca6e000000 size:52186624 thread:1
allocated block device:0x7fca72000000 size:50331648 thread:1
allocated block device:0x7fca7e000000 size:50813952 thread:1
allocated block device:0x7fca81200000 size:6841344 thread:1


allocated block device:0x7fcb413e3a00 size:512 thread:1
allocated block device:0x7fcb413e3c00 size:512 thread:1
deleted block   device:0x7fcb413e4000 size:512 threadid=1 
deleted block   device:0x7fcb413e3e00 size:512 threadid=1 
deleted block   device:0x7fcb413e4400 size:512 threadid=1 
deleted block   device:0x7fcb413e4200 size:512 threadid=1 
deleted block   device:0x7fcb413e7a00 size:512 threadid=1 
FATAL ERROR: variable in data clause is partially present on the device: name=valuetmp2
 file:/home/jsera.lee/lica/LICA/apply_openACC/src/mod_passivescalar.f90 rhs_ibm_ps line:628

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[44575,1],0]
  Exit code:    1
--------------------------------------------------------------------------

this is my code

subroutine rhs_ibm_ps ()
    USE MODULES
    IMPLICIT NONE
    real*8, allocatable, dimension(:,:,:) :: ValueTMP2
    INTEGER*8 :: I,J,K,II,JJ,KK

    allocate(ValueTMP2(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))
    ValueTMP2= 1.d0

    !$acc enter data copyin (ValueTMP2(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1))

    !!! something

    !$acc parallel loop collapse(3) present(ValueTMP2(-1:n1sub+1,-1:n2sub+1,-1:n3sub+1), T(:,:,:,:))
    do k = 1, n3sub-1
    do j = 1, n2sub-1
    do i = 1, n1sub-1
        ValueTMP2(i,j,k)=T(I,J,K,MS)
    end do
    end do
    end do
    !$acc end loop

    call mpi_function1()
    call mpi_function2()

    !!! something

    !$acc exit data delete (ValueTMP2)
end subroutine rhs_ibm_ps 

Is ValueTMP1 related to this ValueTMP2 error?
I don’t know why the ValueTMP2 partially present error occurs.

Is it important the order?

I mean this two code.
I think it is not important because delete gpu buffer, not use.
If i use “!$acc exit data copyout” the order is important. I understand this.

deallocate(ValueTMP2)
!$acc exit data delete (ValueTMP2)
!$acc exit data delete (ValueTMP2)
deallocate(ValueTMP2)

Is this deallocate cpu buffer, exit data delete gpu buffer can occur the partially present error?

It can be since, while rare, the OS could re-use the same address in a different allocation. Best practice is to put the exit data delete before the deallocate.

Again, the partial present of ValueTMP2 may be coming from a missing delete of a different variable in a different routine. These are stack variables. The stack gets pushed and popped each time entering and exiting a subroutine and the stack addresses are reused.

The present table dump shows about 16 variables that look to have host stack addresses. Are you deleting the local arrays in the RHS_IBM routine?

1 Like

All of the projects have been changed
First, deallocation the GPU buffer
After, deallocate cpu buffer using “deallocate()”

After that, there is no longer a partiaally Present error was occured.
Finally, my problem has been solved.

The problem has been solved but I can’t understand something.
I thought about OpenAcc like this.

!$acc enter data create -> cudaMalloc
!$acc exit data delete -> cudaFree
!$acc enter data copyin -> cudaMalloc, cudaMemcpy(hostToDevice) 
!$acc enter data copyout -> cudaMalloc, cudaMemcpy(deviceToHost)
!$acc enter data copy -> cudaMemcpy(hostToDevice), cudaMemcpy(deviceToHost)

So I don’t know why this is wrong.

deallocate(arr)
!$acc exit data delete(arr)

I didn’t use “arr” information. and memory free was looks like successful (cpu/gpu both of them.)
It is just my opinion, i think the compiler have to leave a warning or error message.

And other thing. you said before.

The present table dump shows about 16 variables that look to have host stack addresses. Are you deleting the local arrays in the RHS_IBM routine?

Dynamic allocation is usually assigned to the HEAP area, and it is known that local variable and local Array are assigned to the stack.
In my code all arrays are used as dynamic allocation, and only Scalar variables are being used as local scalar variable.

Local Array thought it would be automatically released when the subroutine ended even if the subroutine was declared, but I did not understand that deleteing the local array.

The array is free’d when deallocated so no longer associated with a host address. Also arr’s pointer can be set to null or the host address re-used by the OS. Basically, the behavior of accessing a pointer after it’s free’d is undefined. So it may work sometimes, fail other times. Again, best to delete the device array before the host array so the host to device address mapping is still valid.

Dynamic allocation is usually assigned to the HEAP area, and it is known that local variable and local Array are assigned to the stack.

The data the array points to would be on the heap, but the array itself, i.e. the descriptor, would be on the stack.

1 Like

I understood that it was caused by a connection table between the device and the host.
Thank you Mat!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.