Compiling a stdpar shared library using an object file

Our Main software suite uses g++. I want to compile some code with nvc++ -stdpar=gpu as a shared library and have the main software suite link to the accelerated code. But that doesn’t seem to work with when compiling stdpar shared library via an obj file.

Everything works fine if i compile the shared library as a single command:

$ nvc++ -o libfor_each.so for_each_lib.cpp -shared -fPIC -stdpar=gpu
$ g++ for_each_main.cpp -L. -lfor_each -o for_each.exe
$ LD_LIBRARY_PATH=. ./for_each.exe

But if I compile using an object file it doesn’t link properly

$ nvc++ --gcc-toolchain=/cvmfs/icecube.opensciencegrid.org/py3-v4.4.0/RHEL_7_x86_64_v2 -stdpar=gpu -fPIC -o for_each.cpp.o -c for_each.cpp
$ nvc++  -o libfor_each.so for_each.cpp.o -shared -fPIC --gcc-toolchain=/cvmfs/icecube.opensciencegrid.org/py3-v4.4.0/RHEL_7_x86_64_v2
$ g++ for_each_main.cpp -L. -lfor_each -o for_each.exe
ld: ./libfor_each.so: undefined reference to `cudaGetDeviceCount'
ld: ./libfor_each.so: undefined reference to `cudaFree'
ld: ./libfor_each.so: undefined reference to `cudaPeekAtLastError'
ld: ./libfor_each.so: undefined reference to `Mcuda_compiled'
ld: ./libfor_each.so: undefined reference to `cudaGetDevice'
ld: ./libfor_each.so: undefined reference to `__cudaRegisterFunction'
ld: ./libfor_each.so: undefined reference to `cudaDeviceGetAttribute'
ld: ./libfor_each.so: undefined reference to `cudaFuncGetAttributes'
ld: ./libfor_each.so: undefined reference to `__pgiLaunchKernelFromStub'
ld: ./libfor_each.so: undefined reference to `cudaGetErrorName'
ld: ./libfor_each.so: undefined reference to `__pgi_cuda_register_fat_binaryA'
ld: ./libfor_each.so: undefined reference to `cudaSetDevice'
ld: ./libfor_each.so: undefined reference to `cudaStreamSynchronize'
ld: ./libfor_each.so: undefined reference to `cudaMalloc'
ld: ./libfor_each.so: undefined reference to `__cudaRegisterVar'
ld: ./libfor_each.so: undefined reference to `__cudaPushCallConfiguration'
ld: ./libfor_each.so: undefined reference to `cudaGetLastError'
ld: ./libfor_each.so: undefined reference to `cudaGetErrorString'
collect2: error: ld returned 1 exit status

Are object files not allowed for stdpar? I don’t necessarily need to use them but I am using a build system that assumes everything is compiled with object files and it is annoying to work around this. I don’t think it matters but these are the files i am useing:

for_each_lib.cpp:

#include <algorithm>
#include <execution>

using namespace std;

struct mul {
  void operator()(float& x) const {
    for (size_t i=0; i<0xFFFFF; i++){
          x=x*1.0000001f; }
    }

};

void multiply_all(std::vector<float>& v) {

  // copy into a vector allocated on the heap
  vector<float> v1(v);

  //execute kernel
  std::for_each(
    std::execution::par_unseq,
    v1.begin(), v1.end(), mul{});

  // copy result back into
  std::copy (v1.begin(), v1.end(), v.begin());
}

for_each_main.cpp:

#include <vector>
#include <iostream>

using namespace std;

void multiply_all(std::vector<float>& v);

int main(){
    size_t N = 1000;
    vector<float> v1(N);
    for (size_t i=0; i<N; i++){
      v1[i]=float(i)/2;
    }
    cout << "\n";
    multiply_all(v1);
    for (size_t i=0; i<N; i+=N/10){
        cout << i << " " << v1[i] << "\n";
    }
    cout<< "\n";
}

Hi kmeagher,

I believe the issue is that you’re missing the “-stdpar=gpu” when creating the shared object. Without this flag, the compiler doesn’t know to link in the CUDA libraries with the shared object.

% nvc++ -stdpar=gpu -fPIC -o for_each.cpp.o -c for_each_lib.cpp
% nvc++ -o libfor_each.so for_each.cpp.o -shared -fPIC -stdpar=gpu
% g++ for_each_main.cpp -L. -lfor_each -o for_each.exe
% ./for_each.exe

0 0
100 58
200 116
300 166
400 232
500 282
600 332
700 382
800 464
900 514

Hope this helps,
Mat

OK, I was able to get this to work if I turn off debugging. There was a -g hidden in my makefile.