With versions 21.3 and 21.5 of the NVHPC compilers the following program:
#include <cstdio>
int main() {
std::size_t const N = 2;
printf("NVHPC %d.%d.%d\n", __NVCOMPILER_MAJOR__, __NVCOMPILER_MINOR__, __NVCOMPILER_PATCHLEVEL__);
for ( auto mode = 0; mode < 2; ++mode ) {
bool enable_gpu{mode};
const char* name{enable_gpu ? "GPU" : "CPU"};
int outer{};
#pragma acc parallel loop copy(outer) if(enable_gpu)
for ( auto i = 0; i < N; ++i ) {
int local = 42;
#pragma acc atomic capture
local = outer++;
printf("local = %d, outer = %d\n", local, outer);
}
printf("%s: %s\n", name, outer == N ? "PASS" : "FAIL");
}
return 0;
}
gives incorrect results when compiled with nvc++ -o test test.cpp -acc
:
NVHPC 21.5.0
local = 0, outer = 0
local = 0, outer = 0
CPU: FAIL
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS
and
NVHPC 21.3.0
local = 0, outer = 0
local = 0, outer = 0
CPU: FAIL
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS
while with NVHPC 21.2 it gives correct results:
NVHPC 21.2.0
local = 0, outer = 1
local = 1, outer = 2
CPU: PASS
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS
this seems like an alarming regression. Do you agree this is a bug, or is the test case doing something undefined?
We can temporarily roll back to 21.2, but it would be great to get this fixed in the next release.