NaNs

Hello again,

I’m working on a quite complicated piece of code and trying to make it GPU-enabled. I’ve been already asking some questions about it. Right now I have a problem with a Not a Number results.
I have a loop that I want to compile and execute on the GPU:

!$acc region do local(ijk,i,j,k), copy(vvect(:,igfy:igfyp1))
	  do ijk=imoj4,imoj5

            if(iffs.eq.0 .and. nf(ijk).ne.0) cycle

            i=i_str(ijk)
            j=j_str(ijk)
            k=k_str(ijk)
c
            include '../comdeck/mijk.f'
            include '../comdeck/pijk.f'

            if(wl.eq.4 .and. i.eq.iprr .and. imax.gt.4) then
              i2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+2+ii5)
              uhalfp=-dudp(ijk)*(vvect(i2jk,igfy)-vvect(ijk,igfy))
            else
              uhalfp=-dudp(ijk)*(vvect(ipjk,igfy)-vvect(ijk,igfy))
            endif
c
            if(wl.eq.4 .and. i.eq.iprl .and. imax.gt.4) then
              im2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+im2+ii5)
              uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(im2jk,igfy))
            else
              uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(imjk,igfy))
            endif
c
            if(wf.eq.4 .and. j.eq.jprbk .and. jmax.gt.4) then
              ij2k=ijk_str2unstr(ii2*(k-1)+ii1+i+ii5)
              vhalfp=-dvdp(ijk)*(vvect(ij2k,igfy)-vvect(ijk,igfy))
            else
              vhalfp=-dvdp(ijk)*(vvect(ijpk,igfy)-vvect(ijk,igfy))
            endif
c
            if(wf.eq.4 .and. j.eq.jprf .and. jmax.gt.4) then
              ijm2k=ijk_str2unstr(ii2*(k-1)+ii1*(jm2-1)+i+ii5)
              vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijm2k,igfy))
            else
              vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijmk,igfy))
            endif
c
            if(wb.eq.4 .and. k.eq.kprt .and. kmax.gt.4) then
              ijk2=ijk_str2unstr(ii2+ii1*(j-1)+i+ii5)
              whalfp=-dwdp(ijk)*(vvect(ijk2,igfy)-vvect(ijk,igfy))
            else
              whalfp=-dwdp(ijk)*(vvect(ijkp,igfy)-vvect(ijk,igfy))
            endif
c
            if(wb.eq.4 .and. k.eq.kprb .and. kmax.gt.4) then
              ijkm2=ijk_str2unstr(ii2*(km2-1)+ii1*(j-1)+i+ii5)
              whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm2,igfy))
            else
              whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm,igfy))
            endif

            vvect(ijk,igfyp1)=rri(i)*(rdx(i)*(afr(ijk)*uhalfp/rr(i)-
     1        afr(imjk)*uhalfm/rr(i-1))+
     2        rdy(j)*(afb(ijk)*vhalfp-afb(ijmk)*vhalfm))+
     3        rdz(k)*(aft(ijk)*whalfp-aft(ijkm)*whalfm)
     4        +vf(ijk)*rcsqf(ijk)*rdelt*vvect(ijk,igfy)
              vvect(ijk,igfyp1)=vvect(ijk,igfyp1)*beta(ijk)

	  enddo ! (ijk)

!$acc end region



$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -Mcuda -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F
(...)
             Generating copy(vvect(:,igfy:igfyp1))
(...)

After executing it on the GPU some elements in vvect array are NaN. They are not NaNs when the code is executed on the CPU.
The funny thing is that when I remove the copy() directive from code and leave only:

!$acc region do local(ijk,i,j,k)

The resulting array contains only zeros. It is weird because the compilator add the directive

             Generating copy(vvect(:,igfy:igfyp1))

by its own, so there should not be any difference.

So, any ideas where the NaNs are comming from and why those two versions of directives gives different results?

I though about emulating the GPU and writing out all the variables in each iteration, but I understand that I can not emulate the GPU using PGI Accelerator model, right? If I could I would check all the variables that are used to compute vvect elements. So, are there other ways than moving from PGI Accelerator model to CUDA Fortran to check it?

We have a sentence in Poland: “Who asks do not wander”. So, I’ve asked you and partially solved my problem by my own. ;)
Ok, so the NaNs are caused by rcsqf array which is used in calculation of vvect. This array is declared as below:

      real(kind(zzz)), dimension(:), allocatable, save, target :: rcsqf

Others are declared similar but without the “target” directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?

Hi szczelba,

While I doubt it’s the problem, mixing CUDA Fortran and the PGI Accelerator Model isn’t supported on Windows. So the first thing to try is remove the “-Mcuda” flag.

$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F



Others are declared similar but without the “target” directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?

I don’t see how the target could effect this but then again it could be a compiler bug.

What is “rcsqf”'s Minfo copy message? What happens if you add “rcsqf” to the region’s copy directive?

Can you send the code to PGI Customer Service (trs@pgroup.com) and ask them to send it to me? If it is compiler bug, I’d like to send in a report to our engineers.

Thanks,
Mat

The copy message is:

   Generating copyin(rcsqf$p(imoj4:imoj5))

I see this “$p” sign only in case of this array, which as the only one is defined as “target”.
Adding rcsqf to the region copy directive does not change anything. Even the above copy message (doesn’t change from copyin to copy).

When I copy the rcsqf values to another array on the GPU and then write out this temporal array i get something like:

 'ijk='         1225 ' '    0.000000000000000     
 'ijk='         1226 ' '    0.000000000000000     
 'ijk='         1227 ' '    0.000000000000000     
 'ijk='         1228 ' '    0.000000000000000     
 'ijk='         1229 ' '    0.000000000000000     
 'ijk='         1230 ' '    0.000000000000000     
 'ijk='         1231 ' ' ********************     
 'ijk='         1232 ' '    0.000000000000000     
 'ijk='         1233 ' ' ********************     
 'ijk='         1234 ' ' ********************     
 'ijk='         1235 ' ' ********************     
 'ijk='         1236 ' '                       NaN
 'ijk='         1237 ' '    0.000000000000000     
 'ijk='         1238 ' '    0.000000000000000     
 'ijk='         1239 ' ' ********************     
 'ijk='         1240 ' '    0.000000000000000     
 'ijk='         1241 ' ' ********************     
 'ijk='         1242 ' ' ********************     
 'ijk='         1243 ' '    0.000000000000000     
 'ijk='         1244 ' ' ********************     
 'ijk='         1245 ' ' ********************     
 'ijk='         1246 ' '    0.000000000000000     
 'ijk='         1247 ' ' ********************     
 'ijk='         1248 ' ' ********************     
 'ijk='         1249 ' '    0.000000000000000     
 'ijk='         1250 ' ' ********************     
 'ijk='         1251 ' ' ********************     
 'ijk='         1252 ' ' ********************     
 'ijk='         1253 ' '                       NaN
 'ijk='         1254 ' '    0.000000000000000     
 'ijk='         1255 ' '    0.000000000000000     
 'ijk='         1256 ' ' ********************     
 'ijk='         1257 ' '    0.000000000000000     
 'ijk='         1258 ' ' ********************     
 'ijk='         1259 ' ' ********************     
 'ijk='         1260 ' '    0.000000000000000     
 'ijk='         1261 ' ' ********************     
 'ijk='         1262 ' ' ********************     
 'ijk='         1263 ' '    0.000000000000000     
 'ijk='         1264 ' ' ********************     
 'ijk='         1265 ' '    0.000000000000000     
 'ijk='         1266 ' '    0.000000000000000     
 'ijk='         1267 ' ' ********************     
 'ijk='         1268 ' ' ********************     
 'ijk='         1269 ' ' ********************     
 'ijk='         1270 ' ' ********************     
 'ijk='         1271 ' '    0.000000000000000     
 'ijk='         1272 ' '    0.000000000000000     
 'ijk='         1273 ' ' ********************     
 'ijk='         1274 ' '    0.000000000000000     
 'ijk='         1275 ' ' ********************     
 'ijk='         1276 ' '    0.000000000000000     
 'ijk='         1277 ' '    0.000000000000000     
 'ijk='         1278 ' ' ********************     
 'ijk='         1279 ' ' ********************     
 'ijk='         1280 ' ' ********************     
 'ijk='         1281 ' ' ********************     
 'ijk='         1282 ' '    0.000000000000000     
 'ijk='         1283 ' '    0.000000000000000     
 'ijk='         1284 ' ' ********************     
 'ijk='         1285 ' '    0.000000000000000     
 'ijk='         1286 ' ' ********************     
 'ijk='         1287 ' '    0.000000000000000     
 'ijk='         1288 ' '    0.000000000000000     
 'ijk='         1289 ' ' ********************     
 'ijk='         1290 ' ' ********************     
 'ijk='         1291 ' ' ********************     
 'ijk='         1292 ' ' ********************     
 'ijk='         1293 ' '    0.000000000000000     
 'ijk='         1294 ' '    0.000000000000000     
 'ijk='         1295 ' ' ********************     
 'ijk='         1296 ' '    0.000000000000000     
 'ijk='         1297 ' ' ********************     
 'ijk='         1298 ' ' ********************     
 'ijk='         1299 ' '    0.000000000000000     
 'ijk='         1300 ' ' ********************     
 'ijk='         1301 ' ' ********************     
 'ijk='         1302 ' ' ********************     
 'ijk='         1303 ' '                       NaN
 'ijk='         1304 ' '    0.000000000000000     
 'ijk='         1305 ' '    0.000000000000000     
 'ijk='         1306 ' ' ********************     
 'ijk='         1307 ' '    0.000000000000000     
 'ijk='         1308 ' ' ********************     
 'ijk='         1309 ' ' ********************     
 'ijk='         1310 ' '    0.000000000000000     
 'ijk='         1311 ' ' ********************     
 'ijk='         1312 ' ' ********************     
 'ijk='         1313 ' '    0.000000000000000     
 'ijk='         1314 ' ' ********************     
 'ijk='         1315 ' ' ********************     
 'ijk='         1316 ' '    0.000000000000000     
 'ijk='         1317 ' ' ********************     
 'ijk='         1318 ' ' ********************     
 'ijk='         1319 ' ' ********************     
 'ijk='         1320 ' ' ********************     
 'ijk='         1321 ' '    0.000000000000000     
 'ijk='         1322 ' '    0.000000000000000     
 'ijk='         1323 ' ' ********************     
 'ijk='         1324 ' '    0.000000000000000     
 'ijk='         1325 ' ' ********************     
 'ijk='         1326 ' ' ********************     
 'ijk='         1327 ' '    0.000000000000000     
 'ijk='         1328 ' ' ********************     
 'ijk='         1329 ' ' ********************     
 'ijk='         1330 ' ' ********************     
 'ijk='         1331 ' ' ********************     
 'ijk='         1332 ' '    0.000000000000000     
 'ijk='         1333 ' '    0.000000000000000     
 'ijk='         1334 ' ' ********************     
 'ijk='         1335 ' '    0.000000000000000     
 'ijk='         1336 ' ' ********************     
 'ijk='         1337 ' ' ********************     
 'ijk='         1338 ' '    0.000000000000000     
 'ijk='         1339 ' ' ********************     
 'ijk='         1340 ' ' ********************     
 'ijk='         1341 ' ' ********************     
 'ijk='         1342 ' ' ********************     
 'ijk='         1343 ' '    0.000000000000000     
 'ijk='         1344 ' '    0.000000000000000     
 'ijk='         1345 ' ' ********************     
 'ijk='         1346 ' '    0.000000000000000     
 'ijk='         1347 ' ' ********************     
 'ijk='         1348 ' ' ********************     
 'ijk='         1349 ' '    0.000000000000000     
 'ijk='         1350 ' ' ********************     
 'ijk='         1351 ' ' ********************     
 'ijk='         1352 ' ' ********************     
 'ijk='         1353 ' ' ********************     
 'ijk='         1354 ' '    0.000000000000000     
 'ijk='         1355 ' '    0.000000000000000     
 'ijk='         1356 ' ' ********************     
 'ijk='         1357 ' '    0.000000000000000     
 'ijk='         1358 ' ' ********************     
 'ijk='         1359 ' ' ********************     
 'ijk='         1360 ' '    0.000000000000000     
 'ijk='         1361 ' ' ********************

Besides some NaNs there are also some stars instead of values.

Sending all the code would be difficult because I’m working on a program that belongs to someone else. I have source code of only one procedure and execute it by starting the main program with special parameters. I’m rather not allowed to send this code to anybody.

Fortran arrays declared with the target attribute are usually the target of pointer assignments. Look for a pointer assignment, something like

    ptr => rcsqf

where ptr is any Fortran pointer array. If there is a pointer assignment, and the pointer is also used in the accelerator region, there will be a problem. A program like

   real, dimension(:,:), allocatable, target :: a1
   real, dimension(:,:), pointer :: p1
   p1 => a1
   !$acc region do
    do i = 1, n
     a1(i) = 0.0
     b(i) = p1(i)
    enddo

In the original program, a1 and p1 are the same memory locations. However, the accelerator compiler can’t preserve the pointer / target relationship of the data that is copies to the GPU. So the compiler will allocate and copy data for a1 and for p1 separately. On the host, p1(i) would get the same value that was just stored by a1(i)=0.0; on the GPU, p1(i) would get uninitialized memory, because the GPU copy of p1 would be at a different place in memory.

Michael, thanks for your response. I understand what you mean, but I don’t think it is the exact case.
I have an array rcsqf declared as:

      real(kind(zzz)), dimension(:), allocatable, save, target :: rcsqf

Since I don’t have access to full source code I can only assume that there is some pointer that points on this array. But I’m pretty sure it is not used in the code that I want to execute on the GPU. Moreover there is nothing new put into the rcsqf array during execution on the GPU.

Based on your post and on this line from compilation stage:

  Generating copyin(rcsqf$p(imoj4:imoj5))

I assume that no values from rcsqf were copied onto the GPU, but just the pointers. (“$p” mark at the end of “rcsqf” name) So, this is a bit different problem. Why should the copyin directive copy only pointers to the target array, since it is a normally allocated array? The only difference is that it can be pointed on by some pointer.