From the code below, I’d like to get the array ‘VF’. Other arrays such as ‘temp_c’ and ‘temp_VF’ are not needed at the end of the day, and thus I know that they might be declared as being created. The problem is that ‘an’, the number of the outermost loop and also the determinant of the size of the arrays ‘temp_c’ and ‘temp_VF’ are so big that I end up getting the out-of-memory error for GPU.
!$acc parallel loop collapse(4) gang worker vector
do ia = 1, an
do ie = 1, en
do ip = 1, pn
do im = 1, mn
do iet = 1, etn
do iaa = 1, an
temp_c(ia,ie,ip,im,iet,iaa) = aG(ia) + eG(ie) + pG(ip) + mG(im) + etG(iet) - aG(iaa)
if ( temp_c(ia,ie,ip,im,iet,iaa) < 0.0d0 ) then
temp_VF(ia,ie,ip,im,iet,iaa) = -1.0d10
else
temp_VF(ia,ie,ip,im,iet,iaa) = temp_c(ia,ie,ip,im,iet,iaa)**0.5d0
end if
end do
end do
end do
end do
end do
end do
!$acc end parallel loop
!$acc parallel loop collapse(3) gang worker vector
do ia = 1, an
do ie = 1, en
do ip = 1, pn
do im = 1, mn
do iet = 1, etn
VF(ia,ie,ip,im,iet) = maxval(temp_VF(ia,ie,ip,im,iet,:))
end do
end do
end do
end do
end do
!$acc end parallel loop
Thus, I modified the code as follows:
!$acc kernerls loop
do ia = 1, an
do ie = 1, en
do ip = 1, pn
do im = 1, mn
!$acc loop private(temp_c, temp_VF)
do iet = 1, etn
do iaa = 1, an
temp_c = aG(ia) + eG(ie) + pG(ip) + mG(im) + etG(iet) - aG(iaa)
if ( temp_c < 0.0d0 ) then
temp_VF(iaa) = -1.0d10
else
temp_VF(iaa) = temp_c**0.5d0
end if
end do
end do
VF_HP(ia,ie,ip,im,iet) = maxval(temp_VF)
end do
end do
end do
end do
I’m not sure whether this is an efficient way of doing my original intention of taking care of the out-of-memory situation. Any suggestions are very much appreciated.