Nvc++ -stdpar functionality possible without single compilation unit? host linker?

Hi mfeuling01,

The problem here is that you’re explicitly adding the “-lnvc” library on the command line. nvc++ will implicitly add the nvc runtime library so no need for you to add it. But when you do, you’re putting it out of order in where it needs to be on the link line.

No need to put any of the compiler runtime libs on the link line and we implicitly set the rpath, so no need for that either.

% nvc++ -stdpar=gpu test_execute.o test_standalone.cpp -o test_standalone.exe -V22.11
test_standalone.cpp:
% ./test_standalone.exe 1                                                                                                       
Coeff: 1
array_in memory type: 3
  1. What is the correct way to separately compile modules with stdpar=gpu functionality from nvc++? I know separate compilation and linking from nvcc, but I don’t think that should be necessary here as device code should be generated and able to be in-lined all within test_execute().

Here you’re only needing host linking so not an issue. For device linking, we enable RDC by default and nvc++ will invoke the device linker as part of the link step.

  1. If separate compilation described in #1 is possible, am I able to link with a host compiler like g++ (ultimately to use with mex)? I tried changing to g++ in my linking step above and I was able to compile and link, but when I ran ./test_execute.exe, I got “No CUDA device code available”. Being restricted to only nvc++ is incredibly limiting so I’m hoping there’s a way to achieve this.

If you’re creating a shared object or linking using a different compiler like g++, then you may need to add “-gpu=nordc”. This removes the need for device linking.

The caveat being that without RDC, function calling from device code (i.e. a function called from within the body of the transform) can only be made to other device functions defined in the same source file (so they can be inlined).

Hope this helps,
Mat

1 Like