serial construct in openacc, Fortran

In the following serial part of a Fortran code I am trying to copy sequentially the value of unkno(iva,jpoin)+bppnr to unkno(iva,ipoin). The code used is,

c$acc  serial 
       nppn1=0
c
       do 1200 ippas=1,50
c
       nppn0=nppn1+1
       nppn1=lppas(ippas)
c
c     -----did we complete the passes ?
c
       if(nppn1.eq.0)                                          goto 1201
c
c     -----do we have any ?
c
       if(nppn0.gt.nppn1)                                      goto 1199
c
c     -----loop over the receiving points
c
c$acc  loop seq 
       do 1400 ippne=nppn0,nppn1
c
c     -----points
c
       ipoin=bppni(1,ippne)
       jpoin=bppni(2,ippne)
c
c     -----variables 1-nunkp
c
c$acc  loop seq
       do 1410 iva=1,nunkp
       unkno(iva,ipoin)=unkno(iva,jpoin)+bppnr(iva,ippne)
 1410 continue
c
c     ----end of loop over the receiving points
c     
 1400 continue

c
c
c     ----end of loop over the passes
c     
 1199 continue
 1200 continue
 1201 continue
c$acc  end serial

The result obtained using the PGI compiler with the OpenACC directives deactivated and bppnr=0 looks like,

            ipoin        jpoin         unkno(iva,ipoin)   unkno(iva,jpoin)
before loop 160215       160165       100.3518075025082   100.3517910648527      
 after loop 160215       160165       100.3517910648527   100.3517910648527        
before loop 160165       157415       100.3517910648527   100.3517910648527         
 after loop 160165       157415       100.3517910648527   100.3517910648527

The result obtained using the PGI compiler with the OpenACC directives deactivated is:

             ipoin     jpoin    unkno(iva,ipoin)           unkno(iva,jpoin)
before loop  160215    160165   100.3518075025082         100.3517910648527     
after  loop  160215    160165   100.3518075025082         100.3517910648527

When compiling, the PGI compiler says the following:

  2552, Accelerator serial kernel generated
         Generating Tesla code
       2556, !$acc do seq
       2588, !$acc do seq
       2603, !$acc do seq
   2552, Generating implicit copyin(bppnr(:nunkp,:),bppni(:2,:),lppas(:))
         Generating implicit copy(unkno(:nunkp,:))

So, I don’t know what is happening here and why the values of unkno are not correctly updated. Any ideas?

Hi afiguer,

I’m assuming the issue here is that you’re expecting different results before and after. Unfortunately, there’s not enough information here to determine what’s wrong. Are you able to provide a full reproducing example?

If not, please post the compiler feedback messages (-Minfo=accel) so I can see how the compiler is offloading this code. Also, how are you handling data movement? Are you using data regions or are relying on the compiler to implicitly copy the data when reaching the serial compute region?

-Mat

Hi Mat,

I accidentally press the submit button before the post was ready to be submitted. I initiated another post
with the correct issue.

Best,
Alejandro

Hi Alejandro,

Since you’re a new poster, you default to the moderated queue which I first need to approve the posts (this protects against Spam and I’ve since moved you to be a registered user so unmoderated). I saw the second post but deleted it since I thought it was a duplicate. Sorry.

Though, I did see your post over on StackOverflow (gpu - serial construct in openacc, fortran - Stack Overflow) which included the compiler feedback.

Still not enough information to determine the issue so if you could provide a reproducing example that would be helpful. Also, do you have any higher level data regions? If so, are you copying “unkno” back from the device or using an “update self(unkno)” directive?

-Mat

Hi Matt,

I updated the post, so now is a bit more clear. My idea is to run this part of the code in the device, so there is not transfer between the host and device. Unkno is already in the device and the arrays bppnr, bppni and lppas are updated from the host to the device before entering into the subrutine.

I know that before going inside the subrutine with the code that I shared, the values of all the arrays are the correct ones. I will try to generate a reproducible example soon.

Alejandro

Most likely, you’re just missing an “update” directive someplace.

Is there any way to see what the device is doing inside a sequential loop? For example write the results in each iteration?

Yes, unfortmatted printing (print *) is allowed in device compute regions. It’s buffered from the device which can intermix results when run in parallel, but should be in order from a serial region.

I use a print statement in two different position with respect to the inner loop and a stop clause to see whats is happening with the first point only.

       do 1400 ippne=nppn0,nppn1
c
c     -----points
c
       ipoin=bppni(1,ippne)
       jpoin=bppni(2,ippne)
       print *,'before',ipoin,jpoin,ippne,unkno(1,ipoin),
     &           unkno(1,jpoin),nppn0,nppn1
c
c     -----variables 1-nunkp
c
c$acc  loop seq
       do 1410 iva=1,nunkp
       unkno(iva,ipoin)=unkno(iva,jpoin)+bppnr(iva,ippne)
 1410 continue
       print * ,'after',ipoin,jpoin,ippne,unkno(1,ipoin),
     &           unkno(1,jpoin)
       stop
c
c     ----end of loop over the receiving points
c     
 1400 continue

The values that I get are the following,

        
before       6053         8803        13006    100.3518075025082      100.3517910648528                1      13005
after        6053         8803        13006    100.3517910648528      100.3517910648528

The problem here is that despite the loop is from nppn0(1) to nppn1(13005), the first value of ippne is 13006.
Any guess about why is this happening?

Any guess about why is this happening?

Best guess is that it’s a compiler issue, but I can’t really tell without a reproducing example.

How is ippne declared? Does this happen the first time through the loop or the second?

This occurs in the first loop iteration. In the subroutine I use the following:

implicit real*8 (a-h,o-z)

Based on that ippne is an integer. Any suggestion on how to declare the variable
in the device?

I tried to create a reproducing example but I had not success.
However I tried something different:

       subroutine fd_bcunkperibc_gpu(nppni ,nppnr ,nunkp ,npoin ,mppne ,
     &                               bppni ,bppnr ,lppas ,unkno )
c
       implicit real*8 (a-h,o-z)
c
       integer bppni(nppni,mppne),lppas(50)
       integer ippne,nppn0,nppn1
       real*8  bppnr(nppnr,mppne),unkno(nunkp,npoin)
c
c
c     -----loop over the passes
c
       nppn1=0
c
       do 1200 ippas=1,50
c
       nppn0=nppn1+1
       nppn1=lppas(ippas)
c
c     -----did we complete the passes ?
c
       if(nppn1.eq.0)                                          goto 1201
c
c     -----do we have any ?
c
       if(nppn0.gt.nppn1)                                     goto 1199
c
c     -----loop over the receiving points
c
c$acc  enter data copyin(nppn0,nppn1,ippas)
c$acc  kernels  present(bppni,bppnr,unkno)
c$acc  loop seq 
       do 1400 ippne=nppn0,nppn1
c
c     -----points
c
       ipoin=bppni(1,ippne)
       jpoin=bppni(2,ippne)
c
c     -----variables 1-nunkp
c
c$acc  loop seq
       do 1410 iva=1,nunkp
       unkno(iva,ipoin)=unkno(iva,jpoin)+bppnr(iva,ippne)
 1410 continue
c
c     ----end of loop over the receiving points
c     
 1400 continue
c$acc  end kernels
c
c
c     ----end of loop over the passes
c     
 1199 continue
 1200 continue
 1201 continue
c
      return
      end

This code does not work either. But, when I add a print statement as the following

 
       do 1200 ippas=1,50
       print *,'IPPAS=',ippas
       ....

Everything works as expected… I am not really interested on the performance of this subroutine, but I am trying to avoid to copy between dev. and host. Is there any way to write this part of the code in an explicit way in order to generate a ‘more clear code’ for the device?

This seems more likely to be a compiler code generation issue. Best guess is that loop index variables are being optimize somehow to cause the error and by adding the print, this optimization is being inhibited.

If I asked our customer support team to contact you directly, can you send us the full source? If it is a compiler issue, I’d like to get it reported and fixed.

Thanks,
Mat

Hi Matt,

Thank you for your answer. I will ask for permission for sending the code, but I think
this will not be a problem. Please, ask the support team to contact me.

Best,
Alejandro