explanation of PGI output needed

Hi Mat,

I have several questions (PGI 13.2):

  1. What does ‘*’ means? wkl variable.
  2. arrays QS,QR, QC, etc. declared with dimension(its:ite,jts:jte,kts:kte). Why does PGI use ‘1:kte’ as last dimension? Why not to ‘pcopy’ entire array (qs(:,:,:)…)?
  3. Why does PGI generate allocate for some arrays (coldry, wx, wkl,…)? I expected to see present_or_… records.
  4. It seams PGI failed to calculate indexes correctly. Taking into account I have the following lines:
       DO 3200 L=kte+1,NLAYERS,1
          TZ(ii,jj,L) = varint(L) + (TZ(ii,jj,kte) - varint(kte))
          TAVEL(ii,jj,L) = 0.5*(TZ(ii,jj,L) + TZ(ii,jj,L-1))
 3200  CONTINUE

it’s rather strange to see records

Generating copyin(tz(its:ite,jts:jte,kte:nlayers-1))
         Generating copyout(tz(its:ite,jts:jte,:kte))



   3117, Generating present_or_copyout(taucloud(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(qs(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(qr(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(qi(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(qc(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(cldfra(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(delz(its:ite,jts:jte,1:kte))
         Generating present_or_copyout(cldfrac(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(ixindx(:4))
         Generating present_or_copyin(o3wrk(:))
         Generating present_or_copyin(ppwrkh(:))
         Generating present_or_copyin(o3prof(its:ite,jts:jte,1:kte))
         Generating present_or_copyout(tavel(its:ite,jts:jte,1:kte))
         Generating present_or_copyin(t(its:ite,jts:jte,1:kte))
         Generating allocate(pavel(its:ite,jts:jte,:))
         Generating present_or_copyout(pavel(its:ite,jts:jte,nlayers))
         Generating present_or_copyin(p(its:ite,jts:jte,1:kte))
         Generating present_or_copyout(tbound(its:ite,jts:jte))
         Generating present_or_copyin(tsfc(its:ite,jts:jte))
         Generating copyin(qv(its:ite,jts:jte,:kte))
         Generating copyout(qv(its:ite,jts:jte,kts:kte))
         Generating present_or_copyin(pw(its:ite,jts:jte,:))
         Generating present_or_copyin(tw(its:ite,jts:jte,1:kte+1))
         Generating copyin(tz(its:ite,jts:jte,kte:nlayers-1))
         Generating copyout(tz(its:ite,jts:jte,:kte))
         Generating present_or_copyin(pprof(:))
         Generating copyin(pz(its:ite,jts:jte,:))
         Generating copyout(pz(its:ite,jts:jte,nlayers))
         Generating present_or_copyin(tprof(:))
         Generating allocate(coldry(its:ite,jts:jte,1:))
         Generating copyin(coldry(its:ite,jts:jte,1:nlayers))
         Generating copyout(coldry(its:ite,jts:jte,1:kte))
         Generating allocate(wx(its:ite,jts:jte,:,1:))
         Generating copyin(wx(its:ite,jts:jte,:,1:nlayers))
         Generating copyout(wx(its:ite,jts:jte,:,kts:nlayers))
         Generating allocate(wkl(its:ite,jts:jte,:35,:nlayers))
         Generating copy(wkl(its:ite,jts:jte,:6,1:*))
  1. What does ‘*’ means? wkl variable.

Hmmm, I’ve actually not seen this before. I’ll need to ask.

  1. arrays QS,QR, QC, etc. declared with dimension(its:ite,jts:jte,kts:kte). Why does PGI use ‘1:kte’ as last dimension? Why not to ‘pcopy’ entire array (qs(:,:,:)…)?

Sans a data clause by the used, the compiler is looking at how the array is accessed within the loop, not how it was declared, and by default only allcates and copies the elements used. My guess is that the index variable’s range for the third dimension goes from 1 to kte, hence the “1:kte”.

  1. Why does PGI generate allocate for some arrays (coldry, wx, wkl,…)? I expected to see present_or_… records.
  2. It seams PGI failed to calculate indexes correctly. Taking into account I have the following lines:

In both cases, the elements accessed during the assignment to an array are different than the ones being accessed when read. This causes the copyin and copyout clauses to be different.

  • Mat
  1. The “*” is printed because the details were too complex to print.

Thank you Carl!

Mat,

  1. It seams PGI failed to calculate indexes correctly. Taking into account I have the following lines:
   DO 3200 L=kte+1,NLAYERS,1
      TZ(ii,jj,L) = varint(L) + (TZ(ii,jj,kte) - varint(kte))
      TAVEL(ii,jj,L) = 0.5*(TZ(ii,jj,L) + TZ(ii,jj,L-1))

3200 CONTINUE

>
> it's rather strange to see records
>
> ```text
Generating copyin(tz(its:ite,jts:jte,kte:nlayers-1))
Generating copyout(tz(its:ite,jts:jte,:kte))

That is the only place I access TZ variable in the subroutine. Looking at indexes I expect to see

Generating copyin(tz(its:ite,jts:jte,kte:nlayers))
Generating copyout(tz(its:ite,jts:jte,kte+1,nlayers))

Last indexes are not correct in PGI’s output.

Last indexes are not correct in PGI’s output.

Ok, we’ll take a look. Though, we’ll need a reproducing example. The work around would be to use a copy clause to explicitly copy the array.

  • Mat

Hi Mat,

I put an example to in ~aromanenko/WRF.1/ on danger2.

pgf90 -acc -c -Minfo=accel test.f90



         Generating copyin(tz(its:ite,jts:jte,kte:nlayers-1))
         Generating copyout(tz(its:ite,jts:jte,:kte))



3113       TZ(ii,jj,0) = Tw(ii,jj,kte+1)
3114       DO 2000 L = 1, kte
...
3118          TZ(ii,jj,L) = Tw(ii,jj,kte+1-L)
....
3127  2000 CONTINUE
...
3160        DO 3200 L=kte+1,NLAYERS,1
3161           TZ(ii,jj,L) = varint(L) + (TZ(ii,jj,kte) - varint(kte))
3162           TAVEL(ii,jj,L) = 0.5*(TZ(ii,jj,L) + TZ(ii,jj,L-1))
3163  3200 CONTINUE

Hi Mat,

do you have a chance to look at my example on danger server.

Here is another question:

upload CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=1954 device=0 variable=emiss bytes=23856
upload CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=1961 device=0 variable=ppwrkh bytes=128
upload CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=1961 device=0 variable=o3wrk bytes=124
launch CUDA kernel  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=1964 device=0 grid=1x29 block=128
download CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=-1983 device=0 variable=o3prof bytes=508080
upload CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=-1998 device=0 variable=coldry bytes=735840
upload CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=-1998 device=0 variable=colh2o bytes=735840
upload CUDA data  file=/.../phys/module_ra_rrtm.f90 function=rrtm line=-1998 device=0 variable=colco2 bytes=735840

What dose negative line number means?

Alexey