NaNs

szczelba · October 27, 2010, 9:05am

Hello again,

I’m working on a quite complicated piece of code and trying to make it GPU-enabled. I’ve been already asking some questions about it. Right now I have a problem with a Not a Number results.
I have a loop that I want to compile and execute on the GPU:

!$acc region do local(ijk,i,j,k), copy(vvect(:,igfy:igfyp1))
	  do ijk=imoj4,imoj5

            if(iffs.eq.0 .and. nf(ijk).ne.0) cycle

            i=i_str(ijk)
            j=j_str(ijk)
            k=k_str(ijk)
c
            include '../comdeck/mijk.f'
            include '../comdeck/pijk.f'

            if(wl.eq.4 .and. i.eq.iprr .and. imax.gt.4) then
              i2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+2+ii5)
              uhalfp=-dudp(ijk)*(vvect(i2jk,igfy)-vvect(ijk,igfy))
            else
              uhalfp=-dudp(ijk)*(vvect(ipjk,igfy)-vvect(ijk,igfy))
            endif
c
            if(wl.eq.4 .and. i.eq.iprl .and. imax.gt.4) then
              im2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+im2+ii5)
              uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(im2jk,igfy))
            else
              uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(imjk,igfy))
            endif
c
            if(wf.eq.4 .and. j.eq.jprbk .and. jmax.gt.4) then
              ij2k=ijk_str2unstr(ii2*(k-1)+ii1+i+ii5)
              vhalfp=-dvdp(ijk)*(vvect(ij2k,igfy)-vvect(ijk,igfy))
            else
              vhalfp=-dvdp(ijk)*(vvect(ijpk,igfy)-vvect(ijk,igfy))
            endif
c
            if(wf.eq.4 .and. j.eq.jprf .and. jmax.gt.4) then
              ijm2k=ijk_str2unstr(ii2*(k-1)+ii1*(jm2-1)+i+ii5)
              vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijm2k,igfy))
            else
              vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijmk,igfy))
            endif
c
            if(wb.eq.4 .and. k.eq.kprt .and. kmax.gt.4) then
              ijk2=ijk_str2unstr(ii2+ii1*(j-1)+i+ii5)
              whalfp=-dwdp(ijk)*(vvect(ijk2,igfy)-vvect(ijk,igfy))
            else
              whalfp=-dwdp(ijk)*(vvect(ijkp,igfy)-vvect(ijk,igfy))
            endif
c
            if(wb.eq.4 .and. k.eq.kprb .and. kmax.gt.4) then
              ijkm2=ijk_str2unstr(ii2*(km2-1)+ii1*(j-1)+i+ii5)
              whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm2,igfy))
            else
              whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm,igfy))
            endif

            vvect(ijk,igfyp1)=rri(i)*(rdx(i)*(afr(ijk)*uhalfp/rr(i)-
     1        afr(imjk)*uhalfm/rr(i-1))+
     2        rdy(j)*(afb(ijk)*vhalfp-afb(ijmk)*vhalfm))+
     3        rdz(k)*(aft(ijk)*whalfp-aft(ijkm)*whalfm)
     4        +vf(ijk)*rcsqf(ijk)*rdelt*vvect(ijk,igfy)
              vvect(ijk,igfyp1)=vvect(ijk,igfyp1)*beta(ijk)

	  enddo ! (ijk)

!$acc end region

$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -Mcuda -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F
(...)
             Generating copy(vvect(:,igfy:igfyp1))
(...)

After executing it on the GPU some elements in vvect array are NaN. They are not NaNs when the code is executed on the CPU.
The funny thing is that when I remove the copy() directive from code and leave only:

!$acc region do local(ijk,i,j,k)

The resulting array contains only zeros. It is weird because the compilator add the directive

             Generating copy(vvect(:,igfy:igfyp1))

by its own, so there should not be any difference.

So, any ideas where the NaNs are comming from and why those two versions of directives gives different results?

I though about emulating the GPU and writing out all the variables in each iteration, but I understand that I can not emulate the GPU using PGI Accelerator model, right? If I could I would check all the variables that are used to compute vvect elements. So, are there other ways than moving from PGI Accelerator model to CUDA Fortran to check it?

szczelba · October 27, 2010, 11:58am

We have a sentence in Poland: “Who asks do not wander”. So, I’ve asked you and partially solved my problem by my own. ;)
Ok, so the NaNs are caused by rcsqf array which is used in calculation of vvect. This array is declared as below:

      real(kind(zzz)), dimension(:), allocatable, save, target :: rcsqf

Others are declared similar but without the “target” directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?

MatColgrove · October 28, 2010, 7:44pm

Hi szczelba,

While I doubt it’s the problem, mixing CUDA Fortran and the PGI Accelerator Model isn’t supported on Windows. So the first thing to try is remove the “-Mcuda” flag.

$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F

Others are declared similar but without the “target” directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?

I don’t see how the target could effect this but then again it could be a compiler bug.

What is “rcsqf”'s Minfo copy message? What happens if you add “rcsqf” to the region’s copy directive?

Can you send the code to PGI Customer Service (trs@pgroup.com) and ask them to send it to me? If it is compiler bug, I’d like to send in a report to our engineers.

Thanks,
Mat

szczelba · October 29, 2010, 2:43pm

The copy message is:

   Generating copyin(rcsqf$p(imoj4:imoj5))

I see this “$p” sign only in case of this array, which as the only one is defined as “target”.
Adding rcsqf to the region copy directive does not change anything. Even the above copy message (doesn’t change from copyin to copy).

When I copy the rcsqf values to another array on the GPU and then write out this temporal array i get something like:

 'ijk='         1225 ' '    0.000000000000000     
 'ijk='         1226 ' '    0.000000000000000     
 'ijk='         1227 ' '    0.000000000000000     
 'ijk='         1228 ' '    0.000000000000000     
 'ijk='         1229 ' '    0.000000000000000     
 'ijk='         1230 ' '    0.000000000000000     
 'ijk='         1231 ' ' ********************     
 'ijk='         1232 ' '    0.000000000000000     
 'ijk='         1233 ' ' ********************     
 'ijk='         1234 ' ' ********************     
 'ijk='         1235 ' ' ********************     
 'ijk='         1236 ' '                       NaN
 'ijk='         1237 ' '    0.000000000000000     
 'ijk='         1238 ' '    0.000000000000000     
 'ijk='         1239 ' ' ********************     
 'ijk='         1240 ' '    0.000000000000000     
 'ijk='         1241 ' ' ********************     
 'ijk='         1242 ' ' ********************     
 'ijk='         1243 ' '    0.000000000000000     
 'ijk='         1244 ' ' ********************     
 'ijk='         1245 ' ' ********************     
 'ijk='         1246 ' '    0.000000000000000     
 'ijk='         1247 ' ' ********************     
 'ijk='         1248 ' ' ********************     
 'ijk='         1249 ' '    0.000000000000000     
 'ijk='         1250 ' ' ********************     
 'ijk='         1251 ' ' ********************     
 'ijk='         1252 ' ' ********************     
 'ijk='         1253 ' '                       NaN
 'ijk='         1254 ' '    0.000000000000000     
 'ijk='         1255 ' '    0.000000000000000     
 'ijk='         1256 ' ' ********************     
 'ijk='         1257 ' '    0.000000000000000     
 'ijk='         1258 ' ' ********************     
 'ijk='         1259 ' ' ********************     
 'ijk='         1260 ' '    0.000000000000000     
 'ijk='         1261 ' ' ********************     
 'ijk='         1262 ' ' ********************     
 'ijk='         1263 ' '    0.000000000000000     
 'ijk='         1264 ' ' ********************     
 'ijk='         1265 ' '    0.000000000000000     
 'ijk='         1266 ' '    0.000000000000000     
 'ijk='         1267 ' ' ********************     
 'ijk='         1268 ' ' ********************     
 'ijk='         1269 ' ' ********************     
 'ijk='         1270 ' ' ********************     
 'ijk='         1271 ' '    0.000000000000000     
 'ijk='         1272 ' '    0.000000000000000     
 'ijk='         1273 ' ' ********************     
 'ijk='         1274 ' '    0.000000000000000     
 'ijk='         1275 ' ' ********************     
 'ijk='         1276 ' '    0.000000000000000     
 'ijk='         1277 ' '    0.000000000000000     
 'ijk='         1278 ' ' ********************     
 'ijk='         1279 ' ' ********************     
 'ijk='         1280 ' ' ********************     
 'ijk='         1281 ' ' ********************     
 'ijk='         1282 ' '    0.000000000000000     
 'ijk='         1283 ' '    0.000000000000000     
 'ijk='         1284 ' ' ********************     
 'ijk='         1285 ' '    0.000000000000000     
 'ijk='         1286 ' ' ********************     
 'ijk='         1287 ' '    0.000000000000000     
 'ijk='         1288 ' '    0.000000000000000     
 'ijk='         1289 ' ' ********************     
 'ijk='         1290 ' ' ********************     
 'ijk='         1291 ' ' ********************     
 'ijk='         1292 ' ' ********************     
 'ijk='         1293 ' '    0.000000000000000     
 'ijk='         1294 ' '    0.000000000000000     
 'ijk='         1295 ' ' ********************     
 'ijk='         1296 ' '    0.000000000000000     
 'ijk='         1297 ' ' ********************     
 'ijk='         1298 ' ' ********************     
 'ijk='         1299 ' '    0.000000000000000     
 'ijk='         1300 ' ' ********************     
 'ijk='         1301 ' ' ********************     
 'ijk='         1302 ' ' ********************     
 'ijk='         1303 ' '                       NaN
 'ijk='         1304 ' '    0.000000000000000     
 'ijk='         1305 ' '    0.000000000000000     
 'ijk='         1306 ' ' ********************     
 'ijk='         1307 ' '    0.000000000000000     
 'ijk='         1308 ' ' ********************     
 'ijk='         1309 ' ' ********************     
 'ijk='         1310 ' '    0.000000000000000     
 'ijk='         1311 ' ' ********************     
 'ijk='         1312 ' ' ********************     
 'ijk='         1313 ' '    0.000000000000000     
 'ijk='         1314 ' ' ********************     
 'ijk='         1315 ' ' ********************     
 'ijk='         1316 ' '    0.000000000000000     
 'ijk='         1317 ' ' ********************     
 'ijk='         1318 ' ' ********************     
 'ijk='         1319 ' ' ********************     
 'ijk='         1320 ' ' ********************     
 'ijk='         1321 ' '    0.000000000000000     
 'ijk='         1322 ' '    0.000000000000000     
 'ijk='         1323 ' ' ********************     
 'ijk='         1324 ' '    0.000000000000000     
 'ijk='         1325 ' ' ********************     
 'ijk='         1326 ' ' ********************     
 'ijk='         1327 ' '    0.000000000000000     
 'ijk='         1328 ' ' ********************     
 'ijk='         1329 ' ' ********************     
 'ijk='         1330 ' ' ********************     
 'ijk='         1331 ' ' ********************     
 'ijk='         1332 ' '    0.000000000000000     
 'ijk='         1333 ' '    0.000000000000000     
 'ijk='         1334 ' ' ********************     
 'ijk='         1335 ' '    0.000000000000000     
 'ijk='         1336 ' ' ********************     
 'ijk='         1337 ' ' ********************     
 'ijk='         1338 ' '    0.000000000000000     
 'ijk='         1339 ' ' ********************     
 'ijk='         1340 ' ' ********************     
 'ijk='         1341 ' ' ********************     
 'ijk='         1342 ' ' ********************     
 'ijk='         1343 ' '    0.000000000000000     
 'ijk='         1344 ' '    0.000000000000000     
 'ijk='         1345 ' ' ********************     
 'ijk='         1346 ' '    0.000000000000000     
 'ijk='         1347 ' ' ********************     
 'ijk='         1348 ' ' ********************     
 'ijk='         1349 ' '    0.000000000000000     
 'ijk='         1350 ' ' ********************     
 'ijk='         1351 ' ' ********************     
 'ijk='         1352 ' ' ********************     
 'ijk='         1353 ' ' ********************     
 'ijk='         1354 ' '    0.000000000000000     
 'ijk='         1355 ' '    0.000000000000000     
 'ijk='         1356 ' ' ********************     
 'ijk='         1357 ' '    0.000000000000000     
 'ijk='         1358 ' ' ********************     
 'ijk='         1359 ' ' ********************     
 'ijk='         1360 ' '    0.000000000000000     
 'ijk='         1361 ' ' ********************

Besides some NaNs there are also some stars instead of values.

Sending all the code would be difficult because I’m working on a program that belongs to someone else. I have source code of only one procedure and execute it by starting the main program with special parameters. I’m rather not allowed to send this code to anybody.

mwolfe · October 30, 2010, 1:39am

Fortran arrays declared with the target attribute are usually the target of pointer assignments. Look for a pointer assignment, something like

    ptr => rcsqf

where ptr is any Fortran pointer array. If there is a pointer assignment, and the pointer is also used in the accelerator region, there will be a problem. A program like

   real, dimension(:,:), allocatable, target :: a1
   real, dimension(:,:), pointer :: p1
   p1 => a1
   !$acc region do
    do i = 1, n
     a1(i) = 0.0
     b(i) = p1(i)
    enddo

In the original program, a1 and p1 are the same memory locations. However, the accelerator compiler can’t preserve the pointer / target relationship of the data that is copies to the GPU. So the compiler will allocate and copy data for a1 and for p1 separately. On the host, p1(i) would get the same value that was just stored by a1(i)=0.0; on the GPU, p1(i) would get uninitialized memory, because the GPU copy of p1 would be at a different place in memory.

szczelba · November 3, 2010, 8:43am

Michael, thanks for your response. I understand what you mean, but I don’t think it is the exact case.
I have an array rcsqf declared as:

      real(kind(zzz)), dimension(:), allocatable, save, target :: rcsqf

Since I don’t have access to full source code I can only assume that there is some pointer that points on this array. But I’m pretty sure it is not used in the code that I want to execute on the GPU. Moreover there is nothing new put into the rcsqf array during execution on the GPU.

Based on your post and on this line from compilation stage:

  Generating copyin(rcsqf$p(imoj4:imoj5))

I assume that no values from rcsqf were copied onto the GPU, but just the pointers. (“$p” mark at the end of “rcsqf” name) So, this is a bit different problem. Why should the copyin directive copy only pointers to the target array, since it is a normally allocated array? The only difference is that it can be pointed on by some pointer.

Topic		Replies	Views
OpenACC directives and GPU Legacy PGI Compilers	32	12760	August 26, 2016
understanding problems with acc directives. Legacy PGI Compilers	7	12674	May 3, 2010
data region problem Legacy PGI Compilers	4	4963	August 3, 2010
Unknown 8GB memory getting allocated on GPU Legacy PGI Compilers	12	9664	December 7, 2020
OpenACC for code acceleration Legacy PGI Compilers	13	10664	November 6, 2017
Six Loops iteration and reduction Legacy PGI Compilers	15	7896	March 27, 2012
Can I specify vector length in a kernels region? Legacy PGI Compilers	34	1184	May 22, 2023
Avoid reallocating memory on the GPU. Legacy PGI Compilers	11	5611	January 28, 2013
Fortran code not compiling for GPU Legacy PGI Compilers	11	7364	August 23, 2017
Compiling with C++ stdlib Procedures Legacy PGI Compilers	7	9726	January 7, 2015

NaNs

Related topics