Dear all.
The following attempt to compute an inner product of a three-dimensional variable via OpenMP offloading ends in internal compiler errors on 23.9 and 23.11 :
program reduc3d
implicit none
integer, parameter :: f64 = selected_real_kind(9,40)
integer, parameter :: nx = 16
integer, parameter :: ny = 16
integer, parameter :: nz = 16
real(kind=f64), dimension(:, :, :), allocatable :: r
real(kind=f64) :: rn2
integer :: i, j, k
allocate(r(nz+2, ny+2, nx+2))
r = 0.0
!$omp target data map(to:r)
!$omp target
!$omp loop collapse(3)
do k = 2, nx+1
do j = 2, ny+1
do i = 2, nz+1
r(i,j,k) = 1.0
end do
end do
end do
!$omp end loop
rn2 = 0.0
!$omp end target
!$omp target teams distribute parallel do collapse(3) reduction(+:rn2)
do k = 2, nx+1
do j = 2, ny+1
do i = 2, nz+1
rn2 = rn2+ r(i,j,k)*r(i,j,k)
end do
end do
end do
!$omp end target teams distribute parallel do
!$omp end target data
write(*,*) "which gives rn2 = ",rn2
deallocate(r)
end program reduc3d
The message is
NVFORTRAN-F-0000-Internal compiler error. unexpected ILM for reduction op 10 (reduc_min.f90: 33)
Any insight will be appreciated.
Thank you for your attention. Frank
Thanks for the report and the great example. I filed a problem report, TPR #34567, and sent it to engineering for review.
It seems to be triggered by the interaction of having the first do loops use an “loop/end loop” within a target region followed by a second set of do loops which use “distribute do reduction”. Removing the “end loop” or changing the second set of do loops to use the “loop” directive seems to work around the issue.
For example,
test.F90
program reduc3d
implicit none
integer, parameter :: f64 = selected_real_kind(9,40)
integer, parameter :: nx = 16
integer, parameter :: ny = 16
integer, parameter :: nz = 16
real(kind=f64), dimension(:, :, :), allocatable :: r
real(kind=f64) :: rn2
integer :: i, j, k
allocate(r(nz+2, ny+2, nx+2))
r = 0.0
!$omp target data map(to:r)
!$omp target
!$omp loop collapse(3)
do k = 2, nx+1
do j = 2, ny+1
do i = 2, nz+1
r(i,j,k) = 1.0
end do
end do
end do
#ifndef NO_END_LOOP
!$omp end loop
#endif
rn2 = 0.0
!$omp end target
#ifdef USE_LOOP
!$omp target teams loop collapse(3) reduction(+:rn2)
#else
!$omp target teams distribute parallel do collapse(3) reduction(+:rn2)
#endif
do k = 2, nx+1
do j = 2, ny+1
do i = 2, nz+1
rn2 = rn2+ r(i,j,k)*r(i,j,k)
end do
end do
end do
!$omp end target data
write(*,*) "which gives rn2 = ",rn2
deallocate(r)
end program reduc3d
% nvfortran -mp=gpu test.F90 -fast -V23.11
NVFORTRAN-F-0000-Internal compiler error. unexpected ILM for reduction op 10 (test.F90: 39)
% nvfortran -mp=gpu test.F90 -fast -V23.11 -DUSE_LOOP
% nvfortran -mp=gpu test.F90 -fast -V23.11 -DNO_END_LOOP
% a.out
which gives rn2 = 4096.000
Dear Mat, thank you very much for your reply.
I confirm that everything works fine with your modifications in place. (Tested with 23.11).
Best regards, Frank