Hi, I recently upgraded to pgc++ under nvhpc 20.9. A large code base that used to (and still does with 19.10 on Linux) run fine had a very weird segfault error. Sadly I tried to pull this piece out of the code base and I cannot reproduce the segfault like what I did to report compiler bugs in the past. So I’ll try my best to describe what I’ve seen.
There is only one CPU thread running, and OpenACC kernels are running on async queue 0. The segfault happened on the second time the program was at line “h (dt, a)” – can’t step in h (dt, a).
void g (double dt, double& a) {
f (a);
h (dt, a);
}
void f (double& a) {
f_wrapper2 (a, A_FEW_MORE_ARGS);
}
void f_wrapper2 (double& a, A_FEW_MORE_ARGS) {
f_acc (a, A_FEW_MORE_ARGS);
}
where f_acc is extremely simple – almost does nothing.
void f_acc (double& a, A_FEW_MORE_ARGS) {
#pragma acc wait(0)
}
One more thing I noticed was when I put “watch *(double*)0x7fffffffd2c8” which was the address of variable “dt” in gdb, its value was changed (should have been 0.001) immediately after the second time f_acc was called. I later tracked down to this symbol
0x7fffdc6c09dd callq 0x7fffdc67e5d0 <__pgi_uacc_cuda_wait@plt>__pgi_uacc_wait+423
that the value of dt was changed inside this call. The final message given by gdb was something like
received signal SIGSEGV, Segmentation fault.
g (dt=0.4816228645377123, a=<error reading variable>)
This is what I can provide at this time and I would appreciate your suggestions. Thanks in advance.
-stw