Conditional compilation of CPU/GPU code with nvc++

Hi,

Is it possible to conditionally compile sections of routines which may run on CPU, GPU, or both? The most common scenarios I’m coming across are calls to C++ standard output streaming operators. Here is a simplified example:

#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>

struct A
{
  template <typename T>
  void operator()(T& x)
  {
    x++;
    std::cout << __PRETTY_FUNCTION__ << '\n';
  }
};

int main()
{
  std::vector<int> v = {5,100,3,6,6,109,64,234,656,25,7,44,6,232,2};
  const auto pol = std::execution::par_unseq;
  std::for_each(pol, v.begin(), v.end(), A{});
  std::cout << v[0] << '\n'; // 6
  return 0;
}

A command such as nvc++ -fast -stdpar -acc -std=c++17 cc.cpp -V21.9 produces an error message from the linker:

nvlink error   : Undefined reference to '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' in '/tmp/nvc++gTVdsZX_6kDZ.o'
nvlink error   : Undefined reference to 'strlen' in '/tmp/nvc++gTVdsZX_6kDZ.o'
nvlink error   : Undefined reference to '_ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate' in '/tmp/nvc++gTVdsZX_6kDZ.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk_multi/Linux_x86_64/21.9/compilers/bin/tools/nvdd

Many thanks,
Paul

Hi Paul,

Only basic I/O operations are currently available in GPU code, so iostream can’t be used. These will need to be ported to use printf.

  % cat test.cpp
#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
#include <cstdio>

struct A
{
  template <typename T>
  void operator()(T& x)
  {
    x++;
    printf("%s\n",__PRETTY_FUNCTION__);
  }
};

int main()
{
  std::vector<int> v = {5,100,3,6,6,109,64,234,656,25,7,44,6,232,2};
  const auto pol = std::execution::par_unseq;
  std::for_each(pol, v.begin(), v.end(), A{});
  std::cout << v[0] << '\n'; // 6
  return 0;
}

% nvc++ -fast -stdpar test.cpp ; a.out
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
void A::operator()(T &) [with T = int]
6

Thanks Mat.

I’m afraid I wasn’t very clear with my question. Ideally I’m hoping to leave the code segments which don’t run on GPU in place. Is there any option to use say preprocessor directives to control which parts of a routine should be compiled for the GPU?

Regards,
Paul

Yes, though I was hoping you wouldn’t ask ;-). The example I originally wrote for you used it, but it triggered a compiler bug when used inside the offloaded operator. So after the bug is fixed (for tracking it’s filed under TPR#30946), you can do something like the following example.

“if target” is our replacement for CUDA’s “CUDA_ARCH” macro which can’t be used with nvc++ since it’s a single pass compiler. nvcc takes two passes and splits the code into separate device and host versions while nvc++ generates the device and host version in the back-end. Full details can be seen in Bryce’s April 2021 GTC talk starting around the 15min mark: Inside NVC++ and NVFORTRAN - Bryce Adelstein Lelbach - GTC 2021 - YouTube

#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
#include <nv/target>
#include <cstdio>
struct A
{
template <typename T>
void operator()(T& x)
{
x++;
if target(nv::target::is_device) {
printf("%s\n",__PRETTY_FUNCTION__);
} else {
std::cout << __PRETTY_FUNCTION__ << '\n';
}
}
};
int main()
{
std::vector<int> v = {5,100,3,6,6,109,64,234,656,25,7,44,6,232,2};
const auto pol = std::execution::par_unseq;
std::for_each(pol, v.begin(), v.end(), A{});
std::cout << v[0] << '\n'; // 6
return 0;
}
1 Like

Haha - many thanks Mat, I missed that talk from Bryce. I’d suspected this was the general direction of travel, but it’s great to see it laid out so clearly - and to learn of “if target”. Really impressive.