Separate compilation and linking with nvc++


Is it possible for nvc++ to use function objects defined in a different translation unit to the one calling a C++ standard library algorithm? As a small example, if the contents of 3 files, main.cpp, squared.cpp and squared.hpp are:


#include "squared.hpp"
#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>

int main(int argc, char *argv[])
  std::vector<int> v(1<<20,7);
  const auto pol = std::execution::par_unseq;
  std::for_each(pol, v.begin(), v.end(), squared{});
  std::cout << v[0] << '\n';
  return 0;


#include "squared.hpp"

void squared::operator()(int& x) { x = x * x; }


#ifndef __SQUARED_HPP__
#define __SQUARED_HPP__
struct squared                                                                 
  void operator()(int&);                                                       
#endif // __SQUARED_HPP__ 

… a command such as nvc++ -stdpar -std=c++17 squared.cpp main.cpp will fail to link due to an undefined reference to squared::operator()(int&). I’m using nvc++ from v21.9 of the HPC SDK on Ubuntu 20.10.


Hi Paul,

The C++ standard doesn’t have a method to decorate routines to note that a device version needs to be created. Hence we rely on the compiler being able to implicitly generate this. However, it does need to discover this and can’t do so across compilation units so instead needs to rely on non-standard extensions here.

You can either decorate the routine with the CUDA “host device” attribute, or the OpenACC “acc routine” pragma.

% cat squared_d.cpp
#include "squared.hpp"

__host__ __device__
void squared::operator()(int& x) { x = x * x; }
% cat squared_acc.cpp
#include "squared.hpp"

#pragma acc routine
void squared::operator()(int& x) { x = x * x; }
% nvc++ -fast -stdpar squared_d.cpp main.cpp -V21.9; a.out
% nvc++ -fast -stdpar -acc -Minfo=accel squared_acc.cpp main.cpp -V21.9 ; a.out
squared::operator ()(int &):
      4, Generating acc routine seq
         Generating Tesla code

Hope this helps,

1 Like

Thanks Mat, that’s really helpful.

I assume there isn’t a flag to instruct nvc++ to create device versions of all routines it encounters?


That’s a good question. We have that functionality in OpenACC (-acc=routineseq i.e. compile every routine for the device), but I don’t think we’ve tested it with stdpar.

I just tried on your simple example and it looks like the compiler attempts to offload some of the implicitly include Thrust routines. Plus it looks like it’s trying redefine some device attribute routines. It is a big hammer approach so likely would need refinement before it could be used with stdpar.

Let me ask our C++ folks if it would even be feasible to implement, and if so, I can add a request for enhancement (RFE).

1 Like