Profile-feedback has no effect

Hi,

I’m testing profile-feedback with pgcc. I’m working with a simple code
which contain lots of if-else blocks. The version included below
executes for about 15 seconds, while switching to the f2 function
reduces execution time to 12 seconds.

I would expect profile-feedback to find the correct execution paths
for me, but unfortunately the original version compiled with
profile-feedback still executes in 15 seconds. It seems that
exactly the same code is generated with and without profile-feedback:
I have recompiled both cases with -S option and the assembler code
was exactly the same (up to comments).

I have issued the following commands:

pgcc -O -Mpfi pf.c
./a.out
pgcc -O -Mpfo pf.c

For comparing assembler code:

pgcc -S -O -Mpfo pf.c
mv pf.s saved.s
pgcc -S -O pf.c
diff saved.s pf.s

The compiler’s version is:
pgcc 6.2-5 64-bit target on x86-64 Linux

Do I do something wrong?

The code follows (it is a slightly modified version of an example code from Sun website):

#include <stdio.h>
#include <stdlib.h>

unsigned f1(unsigned *a0, unsigned *a1, unsigned *a2,
   unsigned *a3, unsigned *a4, unsigned *a5) {
    unsigned result = 0;

    if (a0 != NULL) { result += (*a0); } else { printf("a0 == NULL"); }
    if (a1 != NULL) { result += (*a1); } else { printf("a1 == NULL"); }
    if (a2 != NULL) { result += (*a2); } else { printf("a2 == NULL"); }
    if (a3 != NULL) { result += (*a3); } else { printf("a3 == NULL"); }
    if (a4 != NULL) { result += (*a4); } else { printf("a4 == NULL"); }
    if (a5 != NULL) { result += (*a5); } else { printf("a5 == NULL"); }
    return result;
}

unsigned f2(unsigned *a0, unsigned *a1, unsigned *a2,
  unsigned *a3, unsigned *a4, unsigned *a5) {
    unsigned result = 0;

    if (a0 == NULL) { printf("a0 == NULL"); } else { result += (*a0); }
    if (a1 == NULL) { printf("a1 == NULL"); } else { result += (*a1); }
    if (a2 == NULL) { printf("a2 == NULL"); } else { result += (*a2); }
    if (a3 == NULL) { printf("a3 == NULL"); } else { result += (*a3); }
    if (a4 == NULL) { printf("a4 == NULL"); } else { result += (*a4); }
    if (a5 == NULL) { printf("a5 == NULL"); } else { result += (*a5); }
    return result;
}

void main(int argc, const char *argv[]) {
    int i, j, niters = 1, n = 6;
    unsigned sum, answer = 0, a[6];

    niters = 1000000000;
    if (argc == 2) {
        niters = atoi(argv[1]);
    }

    for (j = 0; j < n; j++) {
        a[j] = rand();
        answer += a[j];
    }

    for (i = 0; i < niters; i++) {
        sum = f1(a + 0, a + 1, a + 2, a + 3, a + 4, a + 5);
    }

    if (sum == answer) {
        printf("answer = %u\n", answer);
    } else {
        printf("error sum=%u, answer=%u", sum, answer);
    }
}

Hi Maciek,

Thank you for the test case. We are looking into way to improve our profile-guided feedback (PFO) and this appears to be a missed opportunity.

I gathered the following timings on a 2.6Ghz Operton and a 3Ghz Woodcrest. I did modify your code to include calls to both the f1 and f2 functions as well as inserted a high resolution timer. I was not able to run icc built binaries with “-fast” on Opteron due to Intel’s cpuid check.

Opteron               PGCC           ICC
-O0                   23.0 sec          40.0 sec
-O2                   21.7 sec          18.6 sec
-O2 +PFO              21.6 sec          11.2 sec
-fast +IPA*            1.8 sec
-fast +IPA +PFO        1.8 sec
Woodcrest               PGCC           ICC
-O0                   16.7 sec          32.6 sec
-O2                   15.6 sec          10.5 sec
-O2 +PFO              15.6 sec           8.7 sec
-fast +IPA*            2.0 sec           4.0 sec
-fast +IPA +PFO        2.0 sec           4.0 sec

For the PGI “-fast +IPA” the flag set used was “-fastsse -Mipa=fast,inline”. ICC “-fast” includes “-ipo” by default.

Don’t read too much into these results, but for this particular code, Inlining has the most benefit. Not to say that we shouldn’t be looking at more opportunities for PFO, but other optimations are usually more beneficial to a wider group of users.

  • Mat