Hello,
We have been updating our systems to use the latest NVHPC 23.1 release, after sticking with 22.3 for quite some time.
This has yielded some “interesting” compilation warnings/errors, which are sensitive to filesystem paths/locations.
It has proved difficult to get a robust reproducer, the smallest I have been able to obtain is the following:
$ pwd
/tmp/nvhpc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
$ cat test.cpp
#include <iostream>
namespace {
__device__ int g{};
__attribute__((noinline)) __device__ int& get_g() {
return g;
}
}
__device__ int f() {
return get_g();
}
int main() { return 0; }
void foo() { std::cout << ""; }
$ nvc++ -O2 -cuda test.cpp
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccOEwk4xveOhal.gpu (33, 37): parse use of undefined value '@_st__tmp_nvhpc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_test_cpp__ZN26_INTERNAL_8_test_cpp__Z1fv28_GLOBAL__N__8_test_cpp__Z1fv1gE'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (test.cpp: 1)
NVC++/x86-64 Linux 23.1-0: compilation aborted
this is quite sensitive. On the system that I’m testing on then even removing a single “x” from the path causes the compilation to succeed.
It seems that the issue is somehow related to how nvc++
creates globally-unique names for variables in anonymous namespaces, which seem to include a mangled version of the file path.
Exactly how this works seems to be quite fickle. For example, if I replace -O2
with -O
then the name in the error message changes:
$ nvc++ -O -cuda test.cpp
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccP6xm7zWJXnbk.gpu (33, 37): parse use of undefined value '@_st__gpfs_bbp_cscs_ch_ssd_apps_bsd_2022_01_10_stage_compilers_install_gcc_4_8_5_haswell_gcc_11_2_0_ehzq5x_lib_gcc_x86_64_pc_linux_gnu_11_2_0_____________include_c___11_2_0_bits_char_traits_h__ZN26_INTERNAL_8_test_cpp__Z1fv28_GLOBAL__N__8_test_cpp__Z1fv1gE'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (test.cpp: 1)
NVC++/x86-64 Linux 23.1-0: compilation aborted
I was also able to reproduce the issue with -O0
, at which point the __attribute__((noinline))
attribute was not needed.
In essence, we have code approximately along these lines to wrap some global state that we need to manage on the device side: the global state is managed using CUDA annotations and methods, and we define some device functions that access that global state and have OpenACC/OpenMP annotations (so can be called from offloaded regions). This avoids limitations to do with global variables in OpenACC/OpenMP code that is loaded dynamically.