Please help me understand "induction variable live-out from loop: .inl_p_5"

I have this code example that triggers an info message that I do not understand. This is a reproducer for a similar message I am trying to understand from production code. Here is the code (acc-function-struct-test.c):

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    int x[2][10];
    int y[2][10];
    int s[2][10];
} some_settings;

#pragma acc routine seq
static int lookup(int n, int x[], int y[], int s[]) {
    int p;
    for (p=0; p<n; ++p) {
        if (x[p] > y[p]) break;
    }
    return x[p] + y[p] + s[p];
}

void foo(int n, some_settings * settings, int outp[]) {
    #pragma acc parallel loop present(settings[0:1], outp[0:n])
    for (int i=0; i<n; ++i) {
        outp[i] = lookup(i, &settings->x[0][0], &settings->y[0][0], &settings->s[0][0]);
    }
}

int main() {
    some_settings * settings = malloc(sizeof(some_settings));
    for (int i=0; i<10; ++i) {
        settings->x[0][i] = i;
        settings->x[1][i] = 100+i;
        settings->y[0][i] = 20+i;
        settings->y[1][i] = 200+i;
        settings->s[0][i] = 40+i;
        settings->s[1][i] = 400+i;
    }
    #pragma acc enter data copyin(settings[0:1])
    int outp[10] = {0};
    #pragma acc enter data copyin(outp[0:10])

    foo(10, settings, outp);
    #pragma acc exit data copyout(outp[0:10])
    for (int i=0; i<10; ++i) {
        printf("outp[%d] is %d\n", i, outp[i]);
    }
    #pragma acc exit data delete(settings[0:1])
}

If I compile this with optimizations (both 24.3 and 24.5) I’m getting this output:

$ nvc -Minfo=all -O3 -acc acc-function-struct-test.c -o acc-function-struct-test
foo:
     19, Generating present(settings[:1],outp[:n])
         Generating implicit firstprivate(i,n)
         Generating NVIDIA GPU code
         21, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
     22, lookup inlined, size=8 (inline) file acc-function-struct-test.c (11)
          13, Accelerator restriction: induction variable live-out from loop: .inl_p_5
              Loop not vectorized/parallelized: potential early exits
          15, Accelerator restriction: induction variable live-out from loop: .inl_p_5
main:
     28, Loop not fused: function call before adjacent loop
         Generated vector simd code for the loop
     37, Generating enter data copyin(settings[:1])
     40, Generating exit data copyout(outp[:])
         Generating enter data copyin(outp[:])
     42, Loop not vectorized/parallelized: contains call
     46, Generating exit data delete(settings[:1])

The inlining of the lookup function and it’s local variable p into the accelerator region in foo seems to trigger Accelerator restriction: induction variable live-out from loop: .inl_p_5. The code seems to be working correctly, but usually this is a serious message that should be addressed. Can I ignore it in this case? What is this message trying to tell me? With -O0 no inlining happens and the message is gone.

As you surmised, this is the inlined “p” variable from “lookup”. While the messages are not entirely clear, the live-out is for lookup’s “p” loop, and just means that it can’t be implicitly parallelized. However, the outer parallel region in foo is getting parallelized and offloaded correctly.

You can ignore the live-out messages, unless you want to rewrite the loop so it is parallizable, something like:

#pragma acc routine vector
static int lookup(int n, int x[], int y[], int s[]) {
    int p;
    int idx = n;
    #pragma acc loop vector reduction(min:idx)
    for (p=0; p<n; ++p) {
        if (x[p] > y[p]) { idx = p; }
    }
    return x[idx] + y[idx] + s[idx];
}

Now your current code is more efficient so I wouldn’t actually change it. I’m just showing how it could be parallelized. However, if “n” is much larger in your production code, then changing it to a parallel vector loop might be beneficial.

Thank you for the clarification.

It took me most of a day to arrive at that conclusion though. I was compiling with -Minfo=accel initially so I didn’t see the inline messages. Googling around this does not really help since there are not many hits and those are different situations.

It was confusing to me why I was getting an info like that for code that is in an explicitely sequential routine… How would that ever be parallized? I would’ve valued some better messages or none. But I’m not sure that’s worth the effort on your end given that it seems to be quite the corner case.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.