Is it possible to conditionally compile sections of routines which may run on CPU, GPU, or both? The most common scenarios I’m coming across are calls to C++ standard output streaming operators. Here is a simplified example:
A command such as nvc++ -fast -stdpar -acc -std=c++17 cc.cpp -V21.9 produces an error message from the linker:
nvlink error : Undefined reference to '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' in '/tmp/nvc++gTVdsZX_6kDZ.o'
nvlink error : Undefined reference to 'strlen' in '/tmp/nvc++gTVdsZX_6kDZ.o'
nvlink error : Undefined reference to '_ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate' in '/tmp/nvc++gTVdsZX_6kDZ.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk_multi/Linux_x86_64/21.9/compilers/bin/tools/nvdd
I’m afraid I wasn’t very clear with my question. Ideally I’m hoping to leave the code segments which don’t run on GPU in place. Is there any option to use say preprocessor directives to control which parts of a routine should be compiled for the GPU?
Yes, though I was hoping you wouldn’t ask ;-). The example I originally wrote for you used it, but it triggered a compiler bug when used inside the offloaded operator. So after the bug is fixed (for tracking it’s filed under TPR#30946), you can do something like the following example.
“if target” is our replacement for CUDA’s “CUDA_ARCH” macro which can’t be used with nvc++ since it’s a single pass compiler. nvcc takes two passes and splits the code into separate device and host versions while nvc++ generates the device and host version in the back-end. Full details can be seen in Bryce’s April 2021 GTC talk starting around the 15min mark: Inside NVC++ and NVFORTRAN - Bryce Adelstein Lelbach - GTC 2021 - YouTube
Haha - many thanks Mat, I missed that talk from Bryce. I’d suspected this was the general direction of travel, but it’s great to see it laid out so clearly - and to learn of “if target”. Really impressive.