Invalid result of the reduction with teams distribute in a parallel region

Hi,

The nvc++ compiler produces an invalid result when a reduction is performed using “target teams distribute parallel for” in a parallel region.

The problem occurs even if the environnment variable “OMP_NUM_THREADS=1”

The simple code is:

#include <stdio.h>

int main (void) {
    #pragma omp parallel
    {
        int sum = 0;
       // #pragma omp parallel for reduction(+:sum) // => OK on cpu
      // #pragma omp target parallel for reduction(+:sum) // => OK
       #pragma omp target teams distribute parallel for reduction(+:sum) // => WRONG
       for(int i = 0 ; i < 20000; i++) {
          sum += i;
        }
        printf("sum2 = %d\n",sum);
    }
  return 0;
}

OMP_NUM_THREADS=1 ./essai
gives

sum2 = 19232

instead of

sum2 = 199990000

If I comment the first line “#pragma omp parallel” , the result is correct (sum2 = 199990000).

The result is correct if the reduction is used using “target parallel for reduction”.

I use the command: “nvc++ -mp=gpu -O3 essai.c -o essai”

“nvc++ --version” returns
"nvc++ 25.1-0 64-bit target on x86-64 Linux -tp cascadelake "

Mickaêl

Hi Mickaêl,

Sorry, I should have mentioned this when I suggested this as a work around for the ICE with the loop version.

There’s a core issue with using target offload with a outer host parallel having to do with private variables. These are specific example of the larger issue. I’ve already noted the wrong answers in the early report.

-Mat