Nvc++/23.1 compilation issues with internal linkage variables

Hello,

We have been updating our systems to use the latest NVHPC 23.1 release, after sticking with 22.3 for quite some time.

This has yielded some “interesting” compilation warnings/errors, which are sensitive to filesystem paths/locations.

It has proved difficult to get a robust reproducer, the smallest I have been able to obtain is the following:

$ pwd
/tmp/nvhpc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
$ cat test.cpp
#include <iostream>
namespace {
__device__ int g{};
__attribute__((noinline)) __device__ int& get_g() {
  return g;
}
}
__device__ int f() {
  return get_g();
}
int main() { return 0; }
void foo() { std::cout << ""; }
$ nvc++ -O2 -cuda test.cpp
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccOEwk4xveOhal.gpu (33, 37): parse use of undefined value '@_st__tmp_nvhpc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_test_cpp__ZN26_INTERNAL_8_test_cpp__Z1fv28_GLOBAL__N__8_test_cpp__Z1fv1gE'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (test.cpp: 1)
NVC++/x86-64 Linux 23.1-0: compilation aborted

this is quite sensitive. On the system that I’m testing on then even removing a single “x” from the path causes the compilation to succeed.

It seems that the issue is somehow related to how nvc++ creates globally-unique names for variables in anonymous namespaces, which seem to include a mangled version of the file path.

Exactly how this works seems to be quite fickle. For example, if I replace -O2 with -O then the name in the error message changes:

$ nvc++ -O -cuda test.cpp
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccP6xm7zWJXnbk.gpu (33, 37): parse use of undefined value '@_st__gpfs_bbp_cscs_ch_ssd_apps_bsd_2022_01_10_stage_compilers_install_gcc_4_8_5_haswell_gcc_11_2_0_ehzq5x_lib_gcc_x86_64_pc_linux_gnu_11_2_0_____________include_c___11_2_0_bits_char_traits_h__ZN26_INTERNAL_8_test_cpp__Z1fv28_GLOBAL__N__8_test_cpp__Z1fv1gE'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (test.cpp: 1)
NVC++/x86-64 Linux 23.1-0: compilation aborted

I was also able to reproduce the issue with -O0, at which point the __attribute__((noinline)) attribute was not needed.

In essence, we have code approximately along these lines to wrap some global state that we need to manage on the device side: the global state is managed using CUDA annotations and methods, and we define some device functions that access that global state and have OpenACC/OpenMP annotations (so can be called from offloaded regions). This avoids limitations to do with global variables in OpenACC/OpenMP code that is loaded dynamically.

Hi Olli,

For good or bad, I’m not able to reproduce this issue. I’ve tried with multiple versions of the compiler, multiple systems (x86,ARM,Power), GPUs (P100, V100, A100), and CUDA versions (10.2, 11.0, 11.8, 12.0), with no luck.

luna:/tmp/nvhpc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx% nvc++ -O2 -cuda test.cpp -V23.1 -gpu=keep
luna:/tmp/nvhpc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%

Can you compile with “-gpu=keep” and post or send me the “test.n001.gpu” file so I can compare the generated device code to what I produce? Might give some clues as to the difference.

Also, what CUDA version are using? Are you using one of the CUDA SDK’s which ship with the NVHPC SDK or your own install CUDA SDK?

What is the target GPU? Is this on Piz Daint using a P100?

Finally, which OS is installed?

-Mat

Dear Mat,

Thanks for taking a look! I confess I am not that surprised that the reproducer didn’t turn out to be portable…

It doesn’t seem to matter which CUDA version is used; in the setup where we originally saw this it was an external installation of 11.8, but just now I checked with -gpu=cuda12.0, -gpu=cuda11.8 and -gpu=11.0 (so all the bundled versions) and I get the same error in all cases.

With nvc++ -O2 -cuda test.cpp -gpu=cuda11.8,keep I get: test.n001.gpu (4.5 KB).

All my efforts above to make a small reproducer were on a login node without an attached GPU, but our system (BlueBrain5, not Piz Daint) has V100s. In the original build we had -gpu=cuda11.8,lineinfo,cc70,cc80, but the -gpu=ccXY option doesn’t seem to make any difference here – I tried a bunch of different values just now. The OS is RHEL 7.9.

Edit: one other piece of information (probably guessable from the error above): because the RHEL 7.9 system gcc is very old, we have a Spack installation of GCC 11.2 on the system and nvc++ is configured to use that (via makelocalrc).

I hope this helps.
Cheers, Olli

Thanks Olli.

I’ll need to ask engineering for ideas. My theory would be that it’s a string length issue given the filename is part of the encoding of the name mangling of the symbols, but I created symbol names 4 times longer than what you have without issue. Hence there’s likely more to it.

I probably wont have a response till Monday but will let you know when I have more information.

-Mat

Hi Mat,

Thanks! In the meantime we have been able to work around the issue by using external linkage, but it would be great if this fragility could be fixed in a future release.

Cheers, Olli