Nvc++ nv/target header and if target (nv::target::is_device) are fragile

Hello,

Following the suggestion of Cannot dynamically load a shared library containing both OpenACC and CUDA code - #2 by MatColgrove, I have been trying to use if target (nv::target::is_device) statements to guard a few parts of our code that are dynamically unreachable and cause problems when compiled for GPU. In general this has been working fine.

I have now come across a new failure mode, which can easily be reproduced with:

$ cat test.cpp
#include <nv/target>
int main() {
  int x{};
  #pragma acc kernels
  {
    if target (nv::target::is_device) {
      x = 1;
    }
  }
  return x;
}
$ nvc++ -V22.3 -acc -cuda -gpu=cuda11.6 test.cpp
$ nvc++ -V22.3 -acc -cuda -gpu=cuda11.6 -I/path/to/cuda-11.6.1-ngetva/include test.cpp
NVC++-S-1062-Support procedure called within a compute region - __builtin_current_device_sm (test.cpp: 6)
NVC++/x86-64 Linux 22.3-0: compilation completed with severe errors

I can’t reproduce with CUDA 11.7.0 and NVHPC 22.5, but I’m not sure if that is just luck, or because this was fixed deliberately. Internally we are still using 22.3 because of some (presumed unrelated) issues with 22.5 and our code base.

I am not aware that we are doing anything wrong here, so it seems like a bug and I hope the reproducer is helpful.

Best, Olli

P.S. By way of background, this issue showed up in a Spack build of our application; Spack apparently implicitly includes extra -I... arguments via its compiler wrappers. We have picked up various fixes and workarounds over the years as we have tried to support various PGI/NVHPC releases and make our builds more robust. Some notes:

  • Historically we have had to use the explicit -gpu=cudaX.Y flag to avoid the default version being different on nodes with/without GPUs.
  • Spack does not know about the versions of CUDA bundled with NVHPC, so depending on CUDA brings an external installation into the picture. This may eventually be improved in Virtual CUDA by wyphan · Pull Request #30748 · spack/spack · GitHub. We can test if removing the explicit CUDA dependency is a viable workaround with recent NVHPC.
  • Outside Spack, we have historically had to explicitly add an external version of CUDA (with exactly the same version.json as the bundled one) to avoid issues such as Nvcc only partially respects CUDA_HOME ("Input file newer than toolkit").

Hi Olli,

I have in a question to one of our C++ engineers to see if there was a intentional fix in 22.5, but like you, it only reproduces in 22.3. Ok in 22.5 and 22.7.

Now there are slight differences in the CUDA header files we ship with the NVHPC SDK then those in the same version of the CUDA SDK. I’m not sure what exactly would cause this, but it’s likely due to one of these differences.

I’m wondering how Spack determines where to pick-up the CUDA header files. Is it using the “CUDA_HOME” environment variable? If so, can you try setting CUDA_HOME to the CUDA packages that ship with the NVHPC SDK?

  • Historically we have had to use the explicit -gpu=cudaX.Y flag to avoid the default version being different on nodes with/without GPUs.
    Yes, that’s the main reason why we have this flag given the compilation system can have a different CUDA driver version (or no driver) than the system the binary will be run.

Spack does not know about the versions of CUDA bundled with NVHPC, so depending on CUDA brings an external installation into the picture…

I personally don’t use Spack so wont be of much help here, but Wileam is quite good so probably has the correct solution.

Outside Spack, we have historically had to explicitly add an external version of CUDA

Ok, so my suggesting of resetting CUDA_HOME may cause other issues so might not be the best solution…

In 22.7, we did start using “NVHPC_CUDA_HOME” instead of “CUDA_HOME” as the base dir for the CUDA version to use. Should help when wanting to point the NVHPC compiler to use a different CUDA SDK than the one we ship. Wont help with this topic’s issue, but possibly with the one in this link.

-Mat

Engineering got back to me. There was a mismatch between the “nv/target” header in the older CUDA versions and the ones included in NVHPC. They since updated the C++ front-end compiler in 22.5 to be more resilient to mismatched versions.

If you need to continue using 22.3, he suggests adding a “-I/path/to/nvhpc/Linxu_x86_64/22.3/compilers/include” so that the “nv/target” header we ship is picked-up before the one included in the CUDA SDK.

Hope this helps,
Mat