Hi everyone, this is my first post at all.
I’m compiling a Fortran code with mpif90 (nvhpc 22.3). The code is very complex and for the sake of clarity, I’m reporting only the major parts related to my issue. The code is MPI and I’m trying to accelerate it with OpenACC. This is my first attempt with a serious code and so far I only use OpenACC directives to accelerate simple codes.
The part of the code that I’m trying to accelerate is the following:
SUBROUTINE kin
USE global_mod, ONLY: NsMAX, num_zones, zones, MINi, MAXi, MINj, MAXj, MINk, MAXk
USE common_alloc
INTEGER, VALUE :: B, i, j, k, s
DOUBLE PRECISION, VALUE :: Yi_ijk(NsMAX)
DOUBLE PRECISION, VALUE :: T_ijk, p_ijk,S_y
!--------------------------------------------------------------------------------------------------------
!$acc data copy(i,j,k,p,p_ijk,T,T_ijk,Yi,Yi_ijk,s,S_y,NsMAX)
!$acc parallel loop private(i,j,k,s)
do k= MINk(BBB)-(Ghost-1), MAXk(BBB)+(Ghost-1)
do j= MINj(BBB)-(Ghost-1), MAXj(BBB)+(Ghost-1)
do i= MINi(BBB)-(Ghost-1), MAXi(BBB)+(Ghost-1)
p_ijk = p(i,j,k)
T_ijk = T(i,j,k)
Yi_ijk = Yi(:,i,j,k) + 1.0d-20
S_y = 0.0D0
do s=1,NsMAX
S_y = S_y + Yi_ijk(s)
enddo
do s=1,NsMAX
if (S_y/=0.D0) then
Yi_ijk(s) = Yi_ijk(s) / S_y
endif
enddo
end do
end do
end do
!$acc end data
END SUBROUTINE kin
I compile the code with:
mpif90 -c -r8 -acc=gpu -target=gpu -gpu=ccall -Mpreprocess -Mfree -Mextend -Munixlogical -Mbyteswapio -traceback -Mchkptr -Mipa=ptr -Mipa=alias -Mipa=f90ptr -Mchkstk -Mnostack_arrays -Mnofprelaxed -Mnofpapprox -Minfo=accel kin.f90
This is part of a bigger code with hundreds of files and modules.
The issue I’m dealing with is the following:
957, Generating copy(i,j,nsmax,p(:,:,:),s_y,t(:,:,:),t_ijk,p_ijk,s,k,yi_ijk(:)) [if not already present]
958, Generating NVIDIA GPU code
959, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
961, !$acc loop seq
963, !$acc loop seq
968, !$acc loop seq
971, !$acc loop seq
974, !$acc loop seq
958, Generating implicit copyin(maxk(bbb),maxi(bbb),mini(bbb),maxj(bbb),mink(bbb),minj(bbb)) [if not already present]
961, Complex loop carried dependence of yi prevents parallelization
Loop carried dependence of yi_ijk prevents parallelization
Loop carried dependence of yi_ijk prevents vectorization
Loop carried backward dependence of yi_ijk prevents vectorization
Complex loop carried dependence of yi_ijk prevents parallelization
963, Complex loop carried dependence of yi prevents parallelization
Loop carried dependence of yi_ijk prevents parallelization
Loop carried backward dependence of yi_ijk prevents vectorization
968, Reference argument passing prevents parallelization:
Complex loop carried dependence of yi prevents parallelization
972, Reference argument passing prevents parallelization:
976, Reference argument passing prevents parallelization:
“Yi” is defined in a module called “common_alloc” as follows:
double precision, dimension(:,:,:,:), pointer :: Yi
I’ve tried different approaches to solve this issue, also looking for some solutions in the forum. Maybe I didn’t understand at all the error that arise. Does anyone have any idea how I can solve it?
Thanks in advance to all!!