kerenels seg faults with uneven arrays (managed+deepcopy)

caplanr · September 14, 2017, 7:44pm

Hi,
The following code causes a segfault using PGI 17.7 with “-ta:tesla,cc50,cuda8.0,managed,deepcopy”:

!$acc kernels
        do k=1,np
          v%r(:,1,k)=sum0(:)-v%r(:,2,k)
          v%t(:,1,k)=sumc0(:)*sph(k)-sums0(:)*cph(k)
        enddo
        do k=1,npm1
          v%p(:,1,k)= two*( sums0(:)*sp(k)+sumc0(:)*cp(k) )
     &                  -v%p(:,2,k)
        enddo
!$acc end kernels

The first dimension of v%r and v%t are different by 1.

I was eventually able to get this to work, but I had to explicitly separate the loops, and preload the arrays to the device as follows:

!$acc parallel present(sph,cph,sp,cp,v,sums0,sum0,sumc0)
!$acc loop gang worker
        do k=1,np
!$acc loop vector
          do i=1,nrm
            v%r(i,1,k)=sum0(i)-v%r(i,2,k)
          enddo
!$acc loop vector
          do i=1,nr
            v%t(i,1,k)=sumc0(i)*sph(k)-sums0(i)*cph(k)
          enddo
        enddo
!$acc loop gang worker
        do k=1,npm1
!$acc loop vector
          do i=1,nr
            v%p(i,1,k)= two*( sums0(i)*sp(k)+sumc0(i)*cp(k) )
     &                 -v%p(i,2,k)
          enddo
        enddo
!$acc end parallel

I would prefer to not have to change the original compact code.
Do you know what part of the first code was causing the issue? Is it an intrinsic “bad” loop for kernels, or is it a compiler compatibility issue?

MatColgrove · September 15, 2017, 5:36pm

Hi sumseq,

It’s hard to say. “Deepcopy” is brand new and considered a beta feature, so the problem could be there. Though since you were able to work around it by reorganizing the loops, it could be something else.

Does the original code work without deepcopy? How about without managed?

Can you please send a reproducing example to PGI Customer Service (trs@pgroup.com) so we can investigate?

Thanks,
Mat