12.4: problem with the OpenACC present directive

this problem happens in various places in my code, but I will report the simplest case where it happens, a routine with 2 loops:

!$acc kernels present(t)
DO j = 1, je
DO i = 1, ie
tt_lheat(i,j,kup:klow,nnew) = tt_lheat(i,j,kup:klow,nnew) &

  • t(i,j,kup:klow,nnew)
    !$acc end kernels

and this what the compiler will generate:
99, Generating present(t(:,:,:,:))
Generating copy(tt_lheat(1:ie,1:je,kup:klow,nnew))
Generating local(t(1:ie,1:je,kup:klow,nnew))
Generating compute capability 2.0 binary
100, Loop is parallelizable
101, Loop is parallelizable
102, Loop is parallelizable
Accelerator kernel generated
100, !$acc loop gang, vector(4) ! blockidx%y threadidx%z
101, !$acc loop gang, vector(4) ! blockidx%x threadidx%y
102, !$acc loop vector(16) ! threadidx%x
CC 2.0 : 21 registers; 8 shared, 96 constant, 0 local memory bytes; 83% occupancy

clearly local (t) is not necessary, and it’s actually a problem because at run-time I have an error: it seems that the compiler generates a free for what considers the local array (this happens in another subroutine):
unmap dev:0x203ae0200 host:0x22de2b0 size:112 offset:28 data[dev:0x203ae0200 host:0x22de2b0 size:112] (line:2368 name:t$sd)
__pgi_cu_free( 0x203ae0200, lineno=2375, name=t$sd )
call to cuMemFree returned error 700: Launch failed
CUDA driver version: 4020

Using the version 12.3 the local(t) becomes a copy of the same subarray of t.

Best Regards

Hi Tiziano,

Can you post a reproducing example?

clearly local (t) is not necessary,

Possible, but my best guess is that the compiler is creating a contiguous temporary array to give better cache locality when it expands the innermost implied DO loop.

__pgi_cu_free( 0x203ae0200, lineno=2375, name=t$sd )

This is freeing the section descriptor not “t” itself. It may or may not be the point of failure.

call to cuMemFree returned error 700: Launch failed

Typically this means that the kernel abnormally aborted, though the error message doesn’t appear until the next device call, such as a copy or free. Hence, it’s more likely a problem with the kernel and rather than the free. Though, I will need a complete example to better understand the actual cause.

Note that the most common cause for “error 700” is an out-of-bounds array access or other memory violation.

  • Mat