OpenACC directives and their compiler messages

torkyahmad · September 25, 2018, 1:51pm

Hello,
Hope you are having a wonderful week.
Can you kindly help me in understanding why does my Minfo change between running the code alone and using that specific code as a subroutine of a bigger program. It makes use of OpenACC directives like this:

      ALLOCATE( multtwobdmat(3 * n, 3 * nbeams))
      !$acc data copyin(twobdmat,kbeam) copyout(multtwobdmat) 
      !$acc parallel loop
      Do i = 1 , 3 * n                              
      Do j = 1 , 3 * nbeams                         
      !$acc loop
      Do ii = 1 , 3 * nbeams                        
      multtwobdmat(i,j)=multtwobdmat(i,j)+twobdmat(i,ii)*kbeam(ii,j)
      end do
      end do
      end do
      !$acc end data
      ALLOCATE( multtwothreebdmat(3 * ncol, 3 * nbeams))
      !$acc data copyin(twothreebdmat,kbeam) copyout(multtwothreebdmat)
      !$acc parallel loop
      Do i = 1 , 3 * ncol                                        
      Do j = 1 , 3 * nbeams                                     
      !$acc loop
      Do ii = 1 , 3 * nbeams                                
      multtwothreebdmat(i,j) = multtwothreebdmat(i,j)+ (twothreebdmat(i,ii)) * (kbeam(ii,j))
      end do
      end do
      end do
      !$acc end data

When the code used to compile alone with the following flags,

pgfortran -o GENA213.exe GENA213.cuf -fast -Minfo=opt -ta:tesla:cc50 -Minfo=accel -lcula_lapack_pgfortran

it used to compile like this:

     ....
    129, Zero trip check eliminated
    138, Generating copyout(multtwobdmat(:,:))
         Generating copyin(twobdmat(:,:),kbeam(:,:))
    140, Accelerator kernel generated
         Generating Tesla code
        141, !$acc loop gang ! blockidx%x
        142, !$acc loop seq
        144, !$acc loop vector(128) ! threadidx%x
    142, Loop is parallelizable
    144, Loop is parallelizable
    153, Generating copyin(twothreebdmat(:,:))
         Generating copyout(multtwothreebdmat(:,:))
         Generating copyin(kbeam(:,:))
    155, Accelerator kernel generated
         Generating Tesla code
        156, !$acc loop gang ! blockidx%x
        157, !$acc loop seq
        159, !$acc loop vector(128) ! threadidx%x
    157, Loop is parallelizable
    159, Loop is parallelizable
     ....

[/code]

Now, once I placed it in a subroutine and now compiling it with the following:

pgfortran -Mcuda -Mlarge_arrays -o PL.exe  PL.for -fast -ta:tesla:cc50 -acc -Minfo=all -lcula_lapack_pgfortran

I am getting this:

...
   7910,Loop not fused: function call before adjacent loop
         Generated an alternate version of the loop
         Generated vector simd code for the loop
         Generated 2 prefetch instructions for the loop
         Generated vector simd code for the loop
         Generated 2 prefetch instructions for the loop
         FMA (fused multiply-add) instruction(s) generated
   7921, Loop not fused: function call before adjacent loop
   7924, Zero trip check eliminated
         Generated an alternate version of the loop
         Generated vector simd code for the loop containing reductions
         Generated a prefetch instruction for the loop
         Generated vector simd code for the loop containing reductions
         Generated a prefetch instruction for the loop
         FMA (fused multiply-add) instruction(s) generated
...

I have not changed anything in the code, except that the first code is “.cuf” and the second is “.for”. Is it possible that this is the difference? Did the compiler understand the directives and just not showing me the messages?

Can you please help me understand those messages produced the second time? Things like “Loop not fused: function call before adjacent loop” and “Loop interchange produces reordered loop nest: 7938,7940,7937”
I cannot find any source online that can help me understand these messages.

Thank you for your time.
Ahmed

MatColgrove · September 25, 2018, 3:05pm

Hi Amhed,

You did change the -Minfo flag between the two compilations from “-Minfo=opt -Minfo=accel” to “-Minfo=all”. “all” includes “-Minfo=vect”, i.e. vectorization, and “-Minfo=loop”, i,e. loop optimizations, which is what the new messages are indicating. I suspect if you go back to the first compile and use “-Minfo=all”, you’ll see the same messages.

Loop not fused: function call before adjacent loop

This indicates that the two loops could not be fused due to an intervening call.

Loop interchange produces reordered loop nest: 7938,7940,7937

This tells you that the compiler reordered (interchanged) the loops so the loop at 7937 is now the inner-most loop.

Hope this helps,
Mat

Topic		Replies	Views
Add OpenACC to a Fortran loop Legacy PGI Compilers	5	7221	December 3, 2015
Accelerator region ignored Legacy PGI Compilers	1	2135	December 5, 2012
Parallelizing a loop Legacy PGI Compilers	9	5569	March 1, 2016
Accelerator Information Questions Legacy PGI Compilers	1	3255	December 17, 2010
openacc seq directive overridden by parallel directive Legacy PGI Compilers	1	1874	June 26, 2012
Can -acc generate different numerical results ? Legacy PGI Compilers	1	1316	March 25, 2019
Runtime-problem-with-pgfortran and OpenACC CUDA Programming and Performance	1	331	October 7, 2019
understanding problems with acc directives. Legacy PGI Compilers	7	12764	May 3, 2010
Accelerator restriction: unsupported call to ... Legacy PGI Compilers	6	9463	January 30, 2013
Launch of the kernel Legacy PGI Compilers	4	2938	October 18, 2017

OpenACC directives and their compiler messages

Related topics