OpenACC directives and their compiler messages

Hello,
Hope you are having a wonderful week.
Can you kindly help me in understanding why does my Minfo change between running the code alone and using that specific code as a subroutine of a bigger program. It makes use of OpenACC directives like this:

      ALLOCATE( multtwobdmat(3 * n, 3 * nbeams))
      !$acc data copyin(twobdmat,kbeam) copyout(multtwobdmat) 
      !$acc parallel loop
      Do i = 1 , 3 * n                              
      Do j = 1 , 3 * nbeams                         
      !$acc loop
      Do ii = 1 , 3 * nbeams                        
      multtwobdmat(i,j)=multtwobdmat(i,j)+twobdmat(i,ii)*kbeam(ii,j)
      end do
      end do
      end do
      !$acc end data
      ALLOCATE( multtwothreebdmat(3 * ncol, 3 * nbeams))
      !$acc data copyin(twothreebdmat,kbeam) copyout(multtwothreebdmat)
      !$acc parallel loop
      Do i = 1 , 3 * ncol                                        
      Do j = 1 , 3 * nbeams                                     
      !$acc loop
      Do ii = 1 , 3 * nbeams                                
      multtwothreebdmat(i,j) = multtwothreebdmat(i,j)+ (twothreebdmat(i,ii)) * (kbeam(ii,j))
      end do
      end do
      end do
      !$acc end data

When the code used to compile alone with the following flags,

pgfortran -o GENA213.exe GENA213.cuf -fast -Minfo=opt -ta:tesla:cc50 -Minfo=accel -lcula_lapack_pgfortran

it used to compile like this:


     ....
    129, Zero trip check eliminated
    138, Generating copyout(multtwobdmat(:,:))
         Generating copyin(twobdmat(:,:),kbeam(:,:))
    140, Accelerator kernel generated
         Generating Tesla code
        141, !$acc loop gang ! blockidx%x
        142, !$acc loop seq
        144, !$acc loop vector(128) ! threadidx%x
    142, Loop is parallelizable
    144, Loop is parallelizable
    153, Generating copyin(twothreebdmat(:,:))
         Generating copyout(multtwothreebdmat(:,:))
         Generating copyin(kbeam(:,:))
    155, Accelerator kernel generated
         Generating Tesla code
        156, !$acc loop gang ! blockidx%x
        157, !$acc loop seq
        159, !$acc loop vector(128) ! threadidx%x
    157, Loop is parallelizable
    159, Loop is parallelizable
     ....

[/code]

Now, once I placed it in a subroutine and now compiling it with the following:

pgfortran -Mcuda -Mlarge_arrays -o PL.exe  PL.for -fast -ta:tesla:cc50 -acc -Minfo=all -lcula_lapack_pgfortran

I am getting this:

...
   7910,Loop not fused: function call before adjacent loop
         Generated an alternate version of the loop
         Generated vector simd code for the loop
         Generated 2 prefetch instructions for the loop
         Generated vector simd code for the loop
         Generated 2 prefetch instructions for the loop
         FMA (fused multiply-add) instruction(s) generated
   7921, Loop not fused: function call before adjacent loop
   7924, Zero trip check eliminated
         Generated an alternate version of the loop
         Generated vector simd code for the loop containing reductions
         Generated a prefetch instruction for the loop
         Generated vector simd code for the loop containing reductions
         Generated a prefetch instruction for the loop
         FMA (fused multiply-add) instruction(s) generated
...

I have not changed anything in the code, except that the first code is “.cuf” and the second is “.for”. Is it possible that this is the difference? Did the compiler understand the directives and just not showing me the messages?

Can you please help me understand those messages produced the second time? Things like “Loop not fused: function call before adjacent loop” and “Loop interchange produces reordered loop nest: 7938,7940,7937”
I cannot find any source online that can help me understand these messages.

Thank you for your time.
Ahmed

Hi Amhed,

You did change the -Minfo flag between the two compilations from “-Minfo=opt -Minfo=accel” to “-Minfo=all”. “all” includes “-Minfo=vect”, i.e. vectorization, and “-Minfo=loop”, i,e. loop optimizations, which is what the new messages are indicating. I suspect if you go back to the first compile and use “-Minfo=all”, you’ll see the same messages.

Loop not fused: function call before adjacent loop

This indicates that the two loops could not be fused due to an intervening call.

Loop interchange produces reordered loop nest: 7938,7940,7937

This tells you that the compiler reordered (interchanged) the loops so the loop at 7937 is now the inner-most loop.

Hope this helps,
Mat