Nvfortran: Internal compiler error in openmp reduction

Dear all.
The following attempt to compute an inner product of a three-dimensional variable via OpenMP offloading ends in internal compiler errors on 23.9 and 23.11 :

program reduc3d
implicit none
integer, parameter :: f64 = selected_real_kind(9,40)

integer, parameter :: nx = 16
integer, parameter :: ny = 16
integer, parameter :: nz = 16

real(kind=f64), dimension(:, :, :), allocatable :: r 
real(kind=f64) :: rn2
integer :: i, j, k

allocate(r(nz+2, ny+2, nx+2)) 
r  = 0.0

!$omp target data map(to:r)

!$omp target 
!$omp loop collapse(3)
 do  k = 2, nx+1 
   do j = 2, ny+1
      do i = 2, nz+1  
         r(i,j,k)    = 1.0
      end do  
   end do 
end do
!$omp end loop
 rn2 = 0.0
!$omp end target 
!$omp target teams distribute parallel do collapse(3)  reduction(+:rn2)
 do  k = 2, nx+1 
   do j = 2, ny+1
      do i = 2, nz+1  
         rn2 = rn2+ r(i,j,k)*r(i,j,k)
      end do  
   end do 
end do 
!$omp end target teams distribute parallel do

!$omp end target data

write(*,*) "which gives rn2   = ",rn2

deallocate(r)

end program reduc3d

The message is

NVFORTRAN-F-0000-Internal compiler error. unexpected ILM for reduction op      10  (reduc_min.f90: 33)

Any insight will be appreciated.
Thank you for your attention. Frank

Hi Frank,

Thanks for the report and the great example. I filed a problem report, TPR #34567, and sent it to engineering for review.

It seems to be triggered by the interaction of having the first do loops use an “loop/end loop” within a target region followed by a second set of do loops which use “distribute do reduction”. Removing the “end loop” or changing the second set of do loops to use the “loop” directive seems to work around the issue.

For example,
test.F90

program reduc3d
implicit none
integer, parameter :: f64 = selected_real_kind(9,40)

integer, parameter :: nx = 16
integer, parameter :: ny = 16
integer, parameter :: nz = 16

real(kind=f64), dimension(:, :, :), allocatable :: r
real(kind=f64) :: rn2
integer :: i, j, k

allocate(r(nz+2, ny+2, nx+2))
r  = 0.0

!$omp target data map(to:r)

!$omp target
!$omp loop collapse(3)
 do  k = 2, nx+1
   do j = 2, ny+1
      do i = 2, nz+1
         r(i,j,k)    = 1.0
      end do
   end do
 end do
#ifndef NO_END_LOOP
!$omp end loop
#endif
 rn2 = 0.0
!$omp end target

#ifdef USE_LOOP
!$omp target teams loop collapse(3)  reduction(+:rn2)
#else
!$omp target teams distribute parallel do collapse(3)  reduction(+:rn2)
#endif
 do  k = 2, nx+1
   do j = 2, ny+1
      do i = 2, nz+1
         rn2 = rn2+ r(i,j,k)*r(i,j,k)
      end do
   end do
end do

!$omp end target data

write(*,*) "which gives rn2   = ",rn2

deallocate(r)

end program reduc3d
% nvfortran -mp=gpu test.F90 -fast -V23.11
NVFORTRAN-F-0000-Internal compiler error. unexpected ILM for reduction op      10  (test.F90: 39)
% nvfortran -mp=gpu test.F90 -fast -V23.11 -DUSE_LOOP
% nvfortran -mp=gpu test.F90 -fast -V23.11 -DNO_END_LOOP
% a.out
 which gives rn2   =     4096.000

-Mat

Dear Mat, thank you very much for your reply.
I confirm that everything works fine with your modifications in place. (Tested with 23.11).
Best regards, Frank

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.