Unexpected out-of-bounds crash when using `!$acc kernels loop`

Dear all,

Consider the Fortran code w/ OpenACC below, which has loops with a shared 3D array, and a private 1D array. The code crashes (out-of-bounds) with compute-sanitizer for n > 40 when I use an !$acc kernels loop directive, but runs well when I use an !$acc parallel loop directive. I wonder if this is expected due to some bad practice on my end, or if this unveils a compiler bug? -Minfo=accel seems to hint that the generated kernels are equivalent…

To test the code:

for MODE in GOOD BAD; do nvfortran -acc -Minfo=accel -cpp -D_${MODE} test.f90 && compute-sanitizer ./a.out; done

The code:

program p
  implicit none
  integer, parameter :: n = 41 ! works on my Quadro P2000 for n <= 40
  real, allocatable, dimension(:,:,:) :: p2d
  real, allocatable, dimension(:)   :: p1d
  integer :: i,j,k
  !
  allocate(p1d(n))
  allocate(p2d(n,n,n))
  !$acc enter data create(p1d,p2d)
#if defined(_GOOD)
  !$acc parallel loop collapse(3) default(present) private(p1d)
  do k=1,n
    do j=1,n
      do i=1,n
        p1d(i) = 1.*j*k
        p2d(i,j,k) = p1d(i)
      enddo
    enddo
  enddo
  !$acc end parallel
#elif defined(_BAD)
  !$acc kernels loop collapse(3) default(present) private(p1d)
  do k=1,n
    do j=1,n
      do i=1,n
        p1d(i) = 1.*j*k
        p2d(i,j,k) = p1d(i)
      enddo
    enddo
  enddo
  !$acc end kernels
#endif
  !$acc exit data copyout(p2d)
  if( int(p2d(10,9,10)) == 90 ) then
    print*, 'Success!'
  else
    print*, 'Failure!'
  endif
end

Thanks in advance for your feedback! (I posted this question in an OpenACC slack group, but it seems appropriate to ask it here.)

Thanks p.simoes.costa,

I was able to reproduce the issue. What appears to be happening is with kernels, the total size of the private array allocation is being miscomputed with kernels so a bit small. (The compiler allocates all the private arrays as one large block of memory). So while the code will run correctly, compute-santizer sees that the private array memory space is being accessed out-of-bounds.

I’ve filed a problem report, TPR #32134, and sent it engineering for investigation.

Though there are some issues with the example code and by correcting them, the program will work as expected.

First, you shouldn’t have a variable be both shared and private, so I recommend removing “p1d” from the “enter data create” so that it’s only private.

Second, all threads will have a full private copy of “p1d”, but each is only accessing a single element. Hence if you want to still collapse all three loops, it would be better to make “p1d” as scalar. If you do want it be an array, you should make the outer loop a “collapse(2)” and optionally add an “!$acc loop” on the “i” loop. For example:

% cat test.F90
program p
  implicit none
  integer, parameter :: n = 41 ! works on my Quadro P2000 for n <= 40
  real, allocatable, dimension(:,:,:) :: p2d
  real, allocatable, dimension(:)   :: p1d
  integer :: i,j,k
  !
  allocate(p1d(n))
  allocate(p2d(n,n,n))
  !$acc enter data create(p2d)
  !$acc kernels loop collapse(2) default(present) private(p1d)
  do k=1,n
    do j=1,n
     !$acc loop
      do i=1,n
        p1d(i) = 1.*j*k
        p2d(i,j,k) = p1d(i)
      enddo
    enddo
  enddo
  !$acc end kernels
  !$acc exit data copyout(p2d)
  if( int(p2d(10,9,10)) == 90 ) then
    print*, 'Success!'
  else
    print*, 'Failure!'
  endif
end
% nvfortran -Minfo=accel -acc test.F90
p:
     10, Generating enter data create(p2d(:,:,:))
     11, Generating default present(p2d(1:41,1:41,1:41))
     12, Loop is parallelizable
     13, Loop is parallelizable
     15, Loop is parallelizable
         Generating NVIDIA GPU code
         12, !$acc loop gang, vector(4) collapse(2) ! blockidx%y threadidx%y
         13,   ! blockidx%y threadidx%y collapsed
         15, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
     22, Generating exit data copyout(p2d(:,:,:))
% compute-sanitizer a.out
========= COMPUTE-SANITIZER
 Success!
========= ERROR SUMMARY: 0 errors

I posted this question in an OpenACC slack group, but it seems appropriate to ask it here.

I typically answer these types of questions on the OpenACC slack channel as well, so either way works.

Thanks for the report,
Mat

Thank you very much for reporting and for the valuable feedback about my example! Good to know about my inconsistency regarding the use of the private clause and the data statement.

Thank you also for the suggestion about using a scalar instead of an array. I was actually just trying to test the usage of a private array in an OpenACC loop without thinking much about what it was doing, but I should have chosen a better example.

Pedro

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.