Compiler Messages

I have the following code I am lloking to accelerate
INTEGER, ALLOCATABLE :: yxv(:,:,:)
COMPLEX(kind=cx_kind),ALLOCATABLE :: k(:,:,:)
ALLOCATE( yxv(0:NDIR-1, 1:order+1, 1:nterms) )
ALLOCATE( k(0:NDIR-1, 1:order+1, 1:npoints) )

!$acc region copy(k,yxv)
!$acc do parallel
DO px = 1,npoints
DO ix = 1,nterms
DO rx = 1,order+1
DO mux = 0,NDIR-1
k(mux,rx,px) = k(mux,rx,px)*yxv(mux,rx,ix)
END DO
END DO
END DO
END DO
!$acc end region

I get the following message when I compile and the kernel fails to launch when I run it.
151, Generating copy(k(:,:,:))
Generating copy(yxv(:,:,:))
154, Accelerator restriction: scalar variable live-out from loop: ix
Accelerator restriction: scalar variable live-out from loop: .dY0003
Loop carried dependence due to exposed use of ‘k(0:3,1:order+1,i1+1)’ prevents parallelization
155, Accelerator restriction: scalar variable live-out from loop: rx
Accelerator restriction: scalar variable live-out from loop: .dY0004
156, Accelerator restriction: scalar variable live-out from loop: mux
Accelerator restriction: scalar variable live-out from loop: .dY0005
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
153, !$acc do parallel
154, !$acc do seq
Non-stride-1 accesses for array ‘yxv’
155, !$acc do seq
156, !$acc do seq
157, Accelerator restriction: induction variable live-out from loop: px
Accelerator restriction: induction variable live-out from loop: ix
Accelerator restriction: induction variable live-out from loop: rx
Accelerator restriction: induction variable live-out from loop: mux
158, Accelerator restriction: induction variable live-out from loop: mux
Accelerator restriction: induction variable live-out from loop: .dY0005
159, Accelerator restriction: induction variable live-out from loop: rx
Accelerator restriction: induction variable live-out from loop: .dY0004
160, Accelerator restriction: induction variable live-out from loop: ix
Accelerator restriction: induction variable live-out from loop: .dY0003
161, Accelerator restriction: induction variable live-out from loop: px
Accelerator restriction: induction variable live-out from loop: .dY0002

What do these compile messages suggest especially “induction variable live-out from loop”

Hi Karthee,

I tried to recreate your errors from the above code snip-it, but my example accelerated without problems. I’ll need to see more code to give you a better answer.

154, Accelerator restriction: scalar variable live-out from loop: ix

This typically means that you’re using the “ix” variable on the right hand side after the acc region.

Something like

!$acc region
do ix = 1,N

end do
!$acc end region
write(,) ix

This is illegal since each thread has their own “ix” and it’s impossible to tell which “ix” to use. Some work arounds are to use the “private” clause to privatize the variable, remove the RHS use, or change the variable name.

Accelerator restriction: scalar variable live-out from loop: .dY0003

This is a compiler generated temporary variable for an optimized ‘ix’. This message should go away once the “ix” variable is fixed.

Loop carried dependence due to exposed use of ‘k(0:3,1:order+1,i1+1)’ prevents parallelization

Here the compiler is telling you that there is a loop dependency which will prevent the “ix” loop from parallelizing.

154, !$acc do seq

Due to the loop dependency, the compiler is force to schedule the “ix” loop sequentially. If you invert the “px” and “ix” loops (making “ix” the outer most loop) you’ll be able to to get all four loops to parallelize. Also, the “yxv” array will be placed in cache.

Alternatively, you can put the “ix” as the inner most loop. This will allow the “px”, “rx”, and “mux” to parallelize, with the “ix” loop being scheduled sequentially on the GPU. This will give you a higher occupancy.

Note that these are just suggestions. Please experiment.

Non-stride-1 accesses for array ‘yxv’

The schedule is preventing ‘yxv’ memory from being accessed contiguously. This may cause performance issues.

Hope this helps,
Mat

After private discussion with the poster, the problem here was the use of a CONTAINS statement (similar to the my question that you already answered, thanks).

Hi,

I am also having trouble with some private variables, and getting live-out from loop messages.

Here is the kernel:

      !$acc region do kernel, parallel &
      !$acc& vector(256), private(ip,k,i,j), independent
      DO ip = 1, ipend
         DO k = 1, ke         
            i = mind_ilon(ip,ib)
            j = mind_jlat(ip,ib)
         
          t_b(ip,k) =  t(i,j,k,nx)
          p_b(ip,k) = p0(i,j,k)+pp(i,j,k,nx)
         qv_b(ip,k) = qv(i,j,k,nx)
         qc_b(ip,k) = qc(i,j,k,nx)

         rho_b(ip,k) = rho(i,j,k)
        ENDDO
      ENDDO
      !$acc end region

And the compiler message is as follow:

384, Accelerator restriction: scalar variable live-out from loop: .dY0004
Accelerator kernel generated
384, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
CC 1.3 : 32 registers; 20 shared, 532 constant, 0 local memory bytes; 50% occupancy
CC 2.0 : 52 registers; 4 shared, 560 constant, 0 local memory bytes; 33% occupancy
395, Accelerator restriction: induction variable live-out from loop: .dY0004
396, Accelerator restriction: induction variable live-out from loop: .dY0003

l. 384 corresponds to “DO ip=1,end”.


It seems that the compiler is actually creating a parallel region but I am also getting this live-out restriction message which is somewhat strange.
Also how could I relate the intermediate .dY0004 to my actual variables ?

Thanks,

Xavier

Hi Xavier,

.dY0003 and .dY0004 are compiler temporary variables, though I’m not sure why they are needed since it’s a simple loop. My best guess is that the assignments of i and j are being hoisted out of the inner loop, but I’ll need a full reproducible example to see what’s going on.

One thing I’d like you to try is change your acc directive to just “!$acc region do”. Both the ip and k loops are parallelizable so the “independent” and “kernel” clauses shouldn’t be needed. Also, since scalars are priviatized by default, you shouldn’t need the private clause.

  • Mat

Hi Mat,

Thanks for your fast response ! I already tried before with only
$acc region do, but then it executes serially :

    382, Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
    383, Accelerator restriction: scalar variable live-out from loop: .dY0004
         Accelerator kernel generated
        383, !$acc do seq
             Non-stride-1 accesses for array 'mind_jlat'
             Non-stride-1 accesses for array 'mind_ilon'
             CC 1.3 : 32 registers; 0 shared, 528 constant, 0 local memory bytes; 25% occupancy
             CC 2.0 : 52 registers; 0 shared, 560 constant, 0 local memory bytes; 16% occupancy
    394, Accelerator restriction: induction variable live-out from loop: .dY0004
    395, Accelerator restriction: induction variable live-out from loop: .dY0003

We are now putting all the test kernels in our application and this porblem only appears with the full code. I am not able to reproduce this behaviour in a smaller test code. I can send the full application though.

Xavier

I can send the full application though.

Thanks. I’ll watch TRS mail.

  • Mat