Hi,
The nvc++ compiler produces an invalid result when a reduction is performed using “target teams distribute parallel for” in a parallel region.
The problem occurs even if the environnment variable “OMP_NUM_THREADS=1”
The simple code is:
#include <stdio.h>
int main (void) {
#pragma omp parallel
{
int sum = 0;
// #pragma omp parallel for reduction(+:sum) // => OK on cpu
// #pragma omp target parallel for reduction(+:sum) // => OK
#pragma omp target teams distribute parallel for reduction(+:sum) // => WRONG
for(int i = 0 ; i < 20000; i++) {
sum += i;
}
printf("sum2 = %d\n",sum);
}
return 0;
}
OMP_NUM_THREADS=1 ./essai
gives
sum2 = 19232
instead of
sum2 = 199990000
If I comment the first line “#pragma omp parallel” , the result is correct (sum2 = 199990000).
The result is correct if the reduction is used using “target parallel for reduction”.
I use the command: “nvc++ -mp=gpu -O3 essai.c -o essai”
“nvc++ --version” returns
"nvc++ 25.1-0 64-bit target on x86-64 Linux -tp cascadelake "
Mickaêl