Incorrect CPU results with #pragma acc atomic capture

With versions 21.3 and 21.5 of the NVHPC compilers the following program:

#include <cstdio>
int main() {
  std::size_t const N = 2;
  printf("NVHPC %d.%d.%d\n", __NVCOMPILER_MAJOR__, __NVCOMPILER_MINOR__, __NVCOMPILER_PATCHLEVEL__);
  for ( auto mode = 0; mode < 2; ++mode ) {
    bool enable_gpu{mode};
    const char* name{enable_gpu ? "GPU" : "CPU"};
    int outer{};
    #pragma acc parallel loop copy(outer) if(enable_gpu)
    for ( auto i = 0; i < N; ++i ) {
      int local = 42;
      #pragma acc atomic capture
      local = outer++;
      printf("local = %d, outer = %d\n", local, outer);
    }
    printf("%s: %s\n", name, outer == N ? "PASS" : "FAIL");
  }
  return 0;
}

gives incorrect results when compiled with nvc++ -o test test.cpp -acc:

NVHPC 21.5.0
local = 0, outer = 0
local = 0, outer = 0
CPU: FAIL
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS

and

NVHPC 21.3.0
local = 0, outer = 0
local = 0, outer = 0
CPU: FAIL
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS

while with NVHPC 21.2 it gives correct results:

NVHPC 21.2.0
local = 0, outer = 1
local = 1, outer = 2
CPU: PASS
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS

this seems like an alarming regression. Do you agree this is a bug, or is the test case doing something undefined?
We can temporarily roll back to 21.2, but it would be great to get this fixed in the next release.

Hi olupton,

We went through a major revision in our atomic support, which unfortunately introduced a number of issues. Though these have been mostly addressed. You issue seems to have been fixed already in our 21.7 release which was just released yesterday. You can download 21.7 at: NVIDIA HPC SDK 21.7 Downloads | NVIDIA Developer

-Mat

% nvc++ -acc test.cpp -V21.5 ; a.out
NVHPC 21.5.0
local = 0, outer = 0
local = 0, outer = 0
CPU: FAIL
local = 0, outer = 2
local = 1, outer = 2
GPU: PASS
% nvc++ -acc test.cpp -V21.7 ; a.out
NVHPC 21.7.0
local = 0, outer = 1
local = 1, outer = 2
CPU: PASS
local = 0, outer = 1
local = 1, outer = 2
GPU: PASS

thank you Mat for a quick response and pointing out the fix with a newer release!

Just curious - where should I look for the list of such regressions and fixes with a newer release? e.g. I am looking at the 21.7 release notes and 21.5 release notes but I don’t see any mention of fixes related to atomics. Hence my question.

I wasn’t part of this so don’t actually know the reasons why, but they stopped adding the compiler’s fixed issues list to the release notes awhile ago when we did the PGI to NVHPC SDK rebranding.