We are currently working on making the GPU-enabled version of an application dynamically loadable from Python, which implies bundling GPU-enabled code in a shared library that is dlopen
’d by an executable (Python) that is not linked against any OpenACC or OpenMP runtime libraries.
When working on a minimal example of this using OpenACC and NVC++ 22.2, I came across the following strange behaviour
If I build a shared library using OpenACC:
% nvc++ -acc -cuda -gpu=cc70,lineinfo -fPIC -o minimal_directives.o -c minimal_directives.cpp
% nvc++ -acc -cuda -gpu=cc70,lineinfo -fPIC -shared -o libshared.so minimal_directives.o
% cat minimal_directives.cpp
extern "C" int launch() {
int x{3};
#pragma acc kernels copy(x)
{
x = 42;
}
return x != 42;
}
And use nvc++
to build a minimal driver executable:
% nvc++ -o main minimal_main.cpp
% cat minimal_main.cpp
#include <dlfcn.h>
#include <iostream>
using launch_t = int(*) ();
int main() {
void* h = dlopen("./libshared.so", RTLD_NOW);
if(!h) { throw std::runtime_error(dlerror()); }
auto* launch = reinterpret_cast<launch_t>(dlsym(h, "launch"));
if(!launch) { throw std::runtime_error(dlerror()); }
return launch();
}
Then the executable aborts:
% ./main
Current file: /gpfs/bbp.cscs.ch/home/olupton/nvhpc-gpu-code-in-shared-library/minimal_directives.cpp
function: launch
line: 4
This file was compiled: -acc=gpu -gpu=cc70
% echo $?
1
But if I build the same code using g++
, it works:
% g++ -ldl -o main_gcc minimal_main.cpp
% ./main_gcc
% echo $?
0
the compiler versions are:
% nvc++ --version
nvc++ 22.2-0 64-bit target on x86-64 Linux -tp skylake-avx512
...
% g++ --version
g++ (Spack GCC) 11.2.0
...
but I believe this is the same in 22.3.
If I remove the -cuda
option from the nvc++
invocations, it works to load the shared library from the main
executable. -cuda
is not needed in this minimal example, but in the real application then we are using some CUDA APIs and calling some functions defined in CUDA code from OpenACC regions, so we assume that -cuda
will be needed. I have not yet managed to produce a dynamically loadable shared library that mixes OpenACC and CUDA code, but that will be a topic for a different thread.
This behaviour seems surprising to me. Hopefully it will not be a blocker, as Python will typically not be compiled with nvc++
, but it seemed worth flagging.