Separate compilation and linking with nvc++

Hi,

Is it possible for nvc++ to use function objects defined in a different translation unit to the one calling a C++ standard library algorithm? As a small example, if the contents of 3 files, main.cpp, squared.cpp and squared.hpp are:

main.cpp:

#include "squared.hpp"
#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>

int main(int argc, char *argv[])
{
  std::vector<int> v(1<<20,7);
  const auto pol = std::execution::par_unseq;
  std::for_each(pol, v.begin(), v.end(), squared{});
  std::cout << v[0] << '\n';
  return 0;
}

squared.cpp:

#include "squared.hpp"

void squared::operator()(int& x) { x = x * x; }

squared.hpp:

#ifndef __SQUARED_HPP__
#define __SQUARED_HPP__
                                               
struct squared                                                                 
{                                                                              
  void operator()(int&);                                                       
};                                                                             
                                                                               
#endif // __SQUARED_HPP__ 

… a command such as nvc++ -stdpar -std=c++17 squared.cpp main.cpp will fail to link due to an undefined reference to squared::operator()(int&). I’m using nvc++ from v21.9 of the HPC SDK on Ubuntu 20.10.

Thanks,
Paul

Hi Paul,

The C++ standard doesn’t have a method to decorate routines to note that a device version needs to be created. Hence we rely on the compiler being able to implicitly generate this. However, it does need to discover this and can’t do so across compilation units so instead needs to rely on non-standard extensions here.

You can either decorate the routine with the CUDA “host device” attribute, or the OpenACC “acc routine” pragma.

% cat squared_d.cpp
#include "squared.hpp"

#ifdef _NVHPC_STDPAR_GPU
__host__ __device__
#endif
void squared::operator()(int& x) { x = x * x; }
% cat squared_acc.cpp
#include "squared.hpp"

#pragma acc routine
void squared::operator()(int& x) { x = x * x; }
% nvc++ -fast -stdpar squared_d.cpp main.cpp -V21.9; a.out
squared_d.cpp:
main.cpp:
49
% nvc++ -fast -stdpar -acc -Minfo=accel squared_acc.cpp main.cpp -V21.9 ; a.out
squared_acc.cpp:
squared::operator ()(int &):
      4, Generating acc routine seq
         Generating Tesla code
main.cpp:
49

Hope this helps,
Mat

1 Like

Thanks Mat, that’s really helpful.

I assume there isn’t a flag to instruct nvc++ to create device versions of all routines it encounters?

Paul

That’s a good question. We have that functionality in OpenACC (-acc=routineseq i.e. compile every routine for the device), but I don’t think we’ve tested it with stdpar.

I just tried on your simple example and it looks like the compiler attempts to offload some of the implicitly include Thrust routines. Plus it looks like it’s trying redefine some device attribute routines. It is a big hammer approach so likely would need refinement before it could be used with stdpar.

Let me ask our C++ folks if it would even be feasible to implement, and if so, I can add a request for enhancement (RFE).

1 Like

It’s been 2 years, so I was wondering if this guidance (use host device/acc routine) is still up to date. Is there a better way to handle this now?

Hi aklinvex,

Yes, this is still the way I’d recommend handling compilation of device routines who’s definition is in a separate source file.

Another option to try if you can’t decorate the routines would be to use cross-file inlining so the call isn’t needed. Though cross-file inlining is a bit of a pain since it requires a two pass compile. First with the “-Mextract=lib:libname” flag across all the sources to create an inline library, and then a second pass with “-Minline=lib:libname” to inline the routines. Not all routines can be inlined, in particular larger routines, so you still might need to fall back to using “acc routine”.

-Mat