error 702

I am in the transition period migrating from Intel Fortran Compiler to PVF 14.2 to try openacc. My system has GTX 760 and i7-4770k; and currently, my monitor is connected to GTX 760. After spending lots of time, I succeeded in compiling the code, but couldn’t run it because of the error 702. Please help me out.

I wanted to start from something simple. What the code below does is followings: there’s a 3-tuple indexed by (iz,ia,ih) which is interpreted as an individual with some preference that is measured by “vf”; each individual maximize its preference by choosing two variables indexed by (iia,iih); “temp_vf” is a big array that stores all possible measured preference for each choice of (iia,iih); the first loop executes to calculate “temp_vf”; and the second loop is to obtain “vf” which is just obtained by maxval of “temp_vf”

At the moment, I wouldn’t be concerned by data movement and just hope this code to run well.

124  subroutine dynamic_decision()                              
125
126  real(8), dimension(zn,an,hn,an,hn) :: temp_vf
127  real(8), dimension(2)              :: policy_temp
128  real(8) :: c
129  integer :: iz,ia,ih,iia,iih
130
131  !$acc parallel loop 
132  do iz = 1, zn
133  do ia = 1, an
134  do ih = 1, hn
135
136      do iia = 1, an
137      do iih = 1, hn
138
139          c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)
140
141          if (c <= 0.0d0) then
142              temp_vf(iz,ia,ih,iia,iih) = -1.0d10
143          else
144              temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    ) 
145                                      + beta * dot_product(zT(iz,:),old_vf(:,iia,iih))  
146          end if
147
148      end do
149      end do
150  end do
151  end do
152  end do
153  !$acc end parallel loop 
154  !$acc parallel loop 
155  do iz = 1, zn
156  do ia = 1, an
157  do ih = 1, hn
158      vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
159  end do
160  end do
161  end do
162  !$acc end parallel loop 
163
164  end subroutine

To give an extra information on the accelerating region:

    131, Accelerator kernel generated
        132, !$acc loop gang ! blockidx%x
        144, !$acc loop vector(256) ! threadidx%x
               Sum reduction generated for zt$r
    131, Generating present_or_copyin(old_vf(:zt$sd+old_vf$sd-         1,1:an,1:hn))
         Generating present_or_copyin(zt(1:zn,:))
         Generating present_or_copyin(ag(1:an))
         Generating present_or_copyin(pol_inc(1:zn,1:an,1:hn))
         Generating present_or_copyin(hg(1:hn))
         Generating present_or_copyout(temp_vf(:zn,:an,:hn,:an,:hn))
         Generating Tesla code
    133, Loop is parallelizable
    134, Loop is parallelizable
    136, Loop is parallelizable
    137, Loop is parallelizable
    144, Loop is parallelizable
    154, Accelerator kernel generated
        155, !$acc loop gang ! blockidx%x
        158, !$acc loop vector(256) ! threadidx%x
             Max reduction generated for temp_vf$r
    154, Generating present_or_copyin(temp_vf(:zn,:an,:hn,:an,:hn))
           Generating present_or_copyout(vf(1:zn,1:an,1:hn))
           Generating Tesla code
    156, Loop is parallelizable
    157, Loop is parallelizable
    158, Loop is parallelizable

The error I see when running the code:

and also

Would anyone please kindly let me know how to fix this?

Best,

Hi limtaejun,

I would try copying the whole “old_vf” array over to the device. By default, the compiler tries to move the least amount of data over but might be having a difficult time determining how much to bring over given it’s in a dot product.

131, Generating present_or_copyin(old_vf(:zt$sd+old_vf$sd- 1,1:an,1:hn))

I’d also put “temp_vf” in a data region “create” clause so it doesn’t copied and explicitly copy in the remaining arrays.

The one caveat of the “copyout” of “vf” is if the entire array is not updated on the device, this will overwrite some of the host values with garbage values. In this case, either change to using the “copy” clause or copy out only the updated array section. However, sub-arrays take longer to copy since they can’t be transferred in a contiguous block.

124  subroutine dynamic_decision()                              
125 
126  real(8), dimension(zn,an,hn,an,hn) :: temp_vf 
127  real(8), dimension(2)              :: policy_temp 
128  real(8) :: c 
129  integer :: iz,ia,ih,iia,iih 
130 
        !$acc data create(temp_vf) copyin(old_vf,zT,hG,aG,pol_inc) copyout(vf)
131  !$acc parallel loop 
132  do iz = 1, zn 
133  do ia = 1, an 
134  do ih = 1, hn 
135 
136      do iia = 1, an 
137      do iih = 1, hn 
138 
139          c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia) 
140 
141          if (c <= 0.0d0) then 
142              temp_vf(iz,ia,ih,iia,iih) = -1.0d10 
143          else 
144              temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    ) 
145                                      + beta * dot_product(zT(iz,:),old_vf(:,iia,iih))  
146          end if 
147 
148      end do 
149      end do 
150  end do 
151  end do 
152  end do 
153  !$acc end parallel loop 
154  !$acc parallel loop 
155  do iz = 1, zn 
156  do ia = 1, an 
157  do ih = 1, hn 
158      vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:)) 
159  end do 
160  end do 
161  end do 
162  !$acc end parallel loop 
        !$acc end data
163 
164  end subroutine
  • Mat

Hi Mat,

Following your suggestions, I modified my code:

subroutine dynamic_decision()                              

real(8), dimension(zn,an,hn,an,hn) :: temp_vf
real(8) :: c
integer :: iz,ia,ih,iia,iih

!$acc data create(temp_vf) & 
!$acc&     copyin(old_vf,zT,aG,hG,pol_inc) &
!$acc&     copyout(vf)     
!$acc parallel loop 
do iz = 1, zn
do ia = 1, an
do ih = 1, hn
    
    do iia = 1, an
    do iih = 1, hn

        c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)

        if (c <= 0.0d0) then
            temp_vf(iz,ia,ih,iia,iih) = N_A
        else
            temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    ) 
                                     + beta * dot_product(zT(iz,:),old_vf(:,iia,iih))  
        end if

    end do
    end do
end do
end do
end do
!$acc end parallel loop
!$acc parallel loop
do iz = 1, zn
do ia = 1, an
do ih = 1, hn
    vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
end do
end do
end do
!$acc end parallel loop
!$acc end data

end subroutine

And, as before, the build was successful:

    130, Generating create(temp_vf(:,:,:,:,:))
         Generating copyin(old_vf(:,:,:))
         Generating copyin(zt(:,:))
         Generating copyin(ag(:))
         Generating copyin(hg(:))
         Generating copyin(pol_inc(:,:,:))
         Generating copyout(vf(:,:,:))
    133, Accelerator kernel generated
        134, !$acc loop gang ! blockidx%x
        146, !$acc loop vector(256) ! threadidx%x
             Sum reduction generated for zt$r
    133, Generating Tesla code
    135, Loop is parallelizable
    136, Loop is parallelizable
    138, Loop is parallelizable
    139, Loop is parallelizable
    146, Loop is parallelizable
    156, Accelerator kernel generated
        157, !$acc loop gang ! blockidx%x
        160, !$acc loop vector(256) ! threadidx%x
             Max reduction generated for temp_vf$r
    156, Generating Tesla code
    158, Loop is parallelizable
    159, Loop is parallelizable
    160, Loop is parallelizable

But when being run, it resulted in the same error message as before. After doing some test of trials and errors, I thought that all came from the intrinsic function of maxval and dotproduct. Indeed, when I rebuild and run the code not using these two functions, the errors are gone.

Any advise how to deal with this?

Error 702 is a timeout. My guess that the problem is not with maxval or dot_product, just that these take longer to compute.

Since you’re on Windows using a GTX, this means you’re using the Windows Display Device Monitor (WDDM). WDDM will timeout your job after a few seconds to prevent freezing of your monitor. If you had a Quadro or Tesla you could change to using the Tesla Compute Cluster (TCC) driver, but with GTX you’re stuck with WDDM.

If you do a web search you can find ways to increase the timeout, but given it means hacking your registry, I wouldn’t recommend it.

  • Mat