 # error 702

I am in the transition period migrating from Intel Fortran Compiler to PVF 14.2 to try openacc. My system has GTX 760 and i7-4770k; and currently, my monitor is connected to GTX 760. After spending lots of time, I succeeded in compiling the code, but couldn’t run it because of the error 702. Please help me out.

I wanted to start from something simple. What the code below does is followings: there’s a 3-tuple indexed by (iz,ia,ih) which is interpreted as an individual with some preference that is measured by “vf”; each individual maximize its preference by choosing two variables indexed by (iia,iih); “temp_vf” is a big array that stores all possible measured preference for each choice of (iia,iih); the first loop executes to calculate “temp_vf”; and the second loop is to obtain “vf” which is just obtained by maxval of “temp_vf”

At the moment, I wouldn’t be concerned by data movement and just hope this code to run well.

``````124  subroutine dynamic_decision()
125
126  real(8), dimension(zn,an,hn,an,hn) :: temp_vf
127  real(8), dimension(2)              :: policy_temp
128  real(8) :: c
129  integer :: iz,ia,ih,iia,iih
130
131  !\$acc parallel loop
132  do iz = 1, zn
133  do ia = 1, an
134  do ih = 1, hn
135
136      do iia = 1, an
137      do iih = 1, hn
138
139          c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)
140
141          if (c <= 0.0d0) then
142              temp_vf(iz,ia,ih,iia,iih) = -1.0d10
143          else
144              temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    )
145                                      + beta * dot_product(zT(iz,:),old_vf(:,iia,iih))
146          end if
147
148      end do
149      end do
150  end do
151  end do
152  end do
153  !\$acc end parallel loop
154  !\$acc parallel loop
155  do iz = 1, zn
156  do ia = 1, an
157  do ih = 1, hn
158      vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
159  end do
160  end do
161  end do
162  !\$acc end parallel loop
163
164  end subroutine
``````

To give an extra information on the accelerating region:

``````    131, Accelerator kernel generated
132, !\$acc loop gang ! blockidx%x
144, !\$acc loop vector(256) ! threadidx%x
Sum reduction generated for zt\$r
131, Generating present_or_copyin(old_vf(:zt\$sd+old_vf\$sd-         1,1:an,1:hn))
Generating present_or_copyin(zt(1:zn,:))
Generating present_or_copyin(ag(1:an))
Generating present_or_copyin(pol_inc(1:zn,1:an,1:hn))
Generating present_or_copyin(hg(1:hn))
Generating present_or_copyout(temp_vf(:zn,:an,:hn,:an,:hn))
Generating Tesla code
133, Loop is parallelizable
134, Loop is parallelizable
136, Loop is parallelizable
137, Loop is parallelizable
144, Loop is parallelizable
154, Accelerator kernel generated
155, !\$acc loop gang ! blockidx%x
158, !\$acc loop vector(256) ! threadidx%x
Max reduction generated for temp_vf\$r
154, Generating present_or_copyin(temp_vf(:zn,:an,:hn,:an,:hn))
Generating present_or_copyout(vf(1:zn,1:an,1:hn))
Generating Tesla code
156, Loop is parallelizable
157, Loop is parallelizable
158, Loop is parallelizable
``````

The error I see when running the code: and also Would anyone please kindly let me know how to fix this?

Best,

Hi limtaejun,

I would try copying the whole “old_vf” array over to the device. By default, the compiler tries to move the least amount of data over but might be having a difficult time determining how much to bring over given it’s in a dot product.

131, Generating present_or_copyin(old_vf(:zt\$sd+old_vf\$sd- 1,1:an,1:hn))

I’d also put “temp_vf” in a data region “create” clause so it doesn’t copied and explicitly copy in the remaining arrays.

The one caveat of the “copyout” of “vf” is if the entire array is not updated on the device, this will overwrite some of the host values with garbage values. In this case, either change to using the “copy” clause or copy out only the updated array section. However, sub-arrays take longer to copy since they can’t be transferred in a contiguous block.

``````124  subroutine dynamic_decision()
125
126  real(8), dimension(zn,an,hn,an,hn) :: temp_vf
127  real(8), dimension(2)              :: policy_temp
128  real(8) :: c
129  integer :: iz,ia,ih,iia,iih
130
!\$acc data create(temp_vf) copyin(old_vf,zT,hG,aG,pol_inc) copyout(vf)
131  !\$acc parallel loop
132  do iz = 1, zn
133  do ia = 1, an
134  do ih = 1, hn
135
136      do iia = 1, an
137      do iih = 1, hn
138
139          c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)
140
141          if (c <= 0.0d0) then
142              temp_vf(iz,ia,ih,iia,iih) = -1.0d10
143          else
144              temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    )
145                                      + beta * dot_product(zT(iz,:),old_vf(:,iia,iih))
146          end if
147
148      end do
149      end do
150  end do
151  end do
152  end do
153  !\$acc end parallel loop
154  !\$acc parallel loop
155  do iz = 1, zn
156  do ia = 1, an
157  do ih = 1, hn
158      vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
159  end do
160  end do
161  end do
162  !\$acc end parallel loop
!\$acc end data
163
164  end subroutine
``````
• Mat

Hi Mat,

Following your suggestions, I modified my code:

``````subroutine dynamic_decision()

real(8), dimension(zn,an,hn,an,hn) :: temp_vf
real(8) :: c
integer :: iz,ia,ih,iia,iih

!\$acc data create(temp_vf) &
!\$acc&     copyin(old_vf,zT,aG,hG,pol_inc) &
!\$acc&     copyout(vf)
!\$acc parallel loop
do iz = 1, zn
do ia = 1, an
do ih = 1, hn

do iia = 1, an
do iih = 1, hn

c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)

if (c <= 0.0d0) then
temp_vf(iz,ia,ih,iia,iih) = N_A
else
temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    )
+ beta * dot_product(zT(iz,:),old_vf(:,iia,iih))
end if

end do
end do
end do
end do
end do
!\$acc end parallel loop
!\$acc parallel loop
do iz = 1, zn
do ia = 1, an
do ih = 1, hn
vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
end do
end do
end do
!\$acc end parallel loop
!\$acc end data

end subroutine
``````

And, as before, the build was successful:

``````    130, Generating create(temp_vf(:,:,:,:,:))
Generating copyin(old_vf(:,:,:))
Generating copyin(zt(:,:))
Generating copyin(ag(:))
Generating copyin(hg(:))
Generating copyin(pol_inc(:,:,:))
Generating copyout(vf(:,:,:))
133, Accelerator kernel generated
134, !\$acc loop gang ! blockidx%x
146, !\$acc loop vector(256) ! threadidx%x
Sum reduction generated for zt\$r
133, Generating Tesla code
135, Loop is parallelizable
136, Loop is parallelizable
138, Loop is parallelizable
139, Loop is parallelizable
146, Loop is parallelizable
156, Accelerator kernel generated
157, !\$acc loop gang ! blockidx%x
160, !\$acc loop vector(256) ! threadidx%x
Max reduction generated for temp_vf\$r
156, Generating Tesla code
158, Loop is parallelizable
159, Loop is parallelizable
160, Loop is parallelizable
``````

But when being run, it resulted in the same error message as before. After doing some test of trials and errors, I thought that all came from the intrinsic function of maxval and dotproduct. Indeed, when I rebuild and run the code not using these two functions, the errors are gone.

Any advise how to deal with this?

Error 702 is a timeout. My guess that the problem is not with maxval or dot_product, just that these take longer to compute.

Since you’re on Windows using a GTX, this means you’re using the Windows Display Device Monitor (WDDM). WDDM will timeout your job after a few seconds to prevent freezing of your monitor. If you had a Quadro or Tesla you could change to using the Tesla Compute Cluster (TCC) driver, but with GTX you’re stuck with WDDM.

If you do a web search you can find ways to increase the timeout, but given it means hacking your registry, I wouldn’t recommend it.

• Mat