kernels implicit copy with managed memory issue

This snippet of code segfaults when compiled with PGI 17.7 using “-ta:tesla,cuda8.0,cc50,managed,deepcopy”

!$acc kernels
        sum0(:)=buf0(1:nrm)
        sums0(:)=buf0(nrm+1:nrm+nr)
        sumc0(:)=buf0(nrm+nr+1:nrm+nr+nr)
!$acc end kernels

but if I manually copy “buf0” to the device (or use the copy clause) everything works fine:

!$acc kernels present(buf0)
        sum0(:)=buf0(1:nrm)
        sums0(:)=buf0(nrm+1:nrm+nr)
        sumc0(:)=buf0(nrm+nr+1:nrm+nr+nr)
!$acc end kernels

I am confused because I thought that using “managed” made the code ignore all presents, copys, etc for allocatable arrays and use the CUDA driver instead?

(In the above code, buf0 is allocatable, while the sum0 arrays are static.)

I am confused because I thought that using “managed” made the code ignore all presents, copys, etc for allocatable arrays and use the CUDA driver instead?

Not quite. The only thing that managed really does is replace the underlying malloc calls with calls to cudaMallocManaged. The compiler still generates all the implicit copy directives since static array and scalars still need to be handled and if passed into a subroutine, the compiler doesn’t know if an array is static or dynamically allocated. At runtime, the managed memory is then detected as present.

Unless the buf and sum variables are UDTs, then deepcopy shouldn’t matter, though what the actual issue is, I’m not sure.

Can you please write-up a small reproducing example that show the issue? If a small example is difficult to create, please send the full example to PGI Customer Service (trs@pgroup.com) so we can investigate.

Thanks,
Mat

Some further information that might help:

The sum0,sums0,sumc0 arrays are static arrays, while “buf” is allocated.