We are currently working on making the GPU-enabled version of an application dynamically loadable from Python, which implies bundling GPU-enabled code in a shared library that is dlopen
’d by an executable (Python) that is not linked against any OpenACC or OpenMP runtime libraries.
When operating in this mode, we are seeing problems with (at least) the acc_deviceptr
API routine, which is simply returning the (host) pointer passed to it instead of mapping it to a device pointer.
This is quite easy to reproduce in a standalone example:
$ cat main.cpp
#include <dlfcn.h>
#include <stdexcept>
int main() {
void* h = dlopen("./libfoo.so", RTLD_NOW);
if(!h) { throw std::runtime_error{(std::string{"dlopen: "} + dlerror()).c_str()}; }
auto* openacc = reinterpret_cast<void(*)()>(dlsym(h, "openacc"));
if(!openacc) { throw std::runtime_error{(std::string{"dlsym: "} + dlerror()).c_str()}; }
openacc();
return 0;
}
and
$ cat lib.cpp
#include <openacc.h>
#include <iostream>
extern "C" void openacc() {
constexpr auto data_size = 1024;
auto* data = new double[data_size];
auto* d_data = static_cast<double*>(acc_copyin(data, data_size*sizeof(double)));
std::cout << "data = " << data << ", d_data = " << d_data << ", acc_deviceptr(data) = " << acc_deviceptr(data) << std::endl;
}
compiled with
nvc++ -acc -gpu=nordc -shared -o libfoo.so lib.cpp
g++ -ldl -o main main.cpp
gives
$ ./main
data = 0x41cca0, d_data = 0x7fff9bafa000, acc_deviceptr(data) = 0x41cca0
i.e. acc_copyin
returns a plausible device pointer, but later calling acc_deviceptr
returns the host pointer.
I am using nvc++
22.3, and this does not appear to be correct behaviour. Is this a known issue, or is there any easy workaround? We can try to propagate the return value of acc_copyin
manually to where it’s needed, but this may prove to be a challenge in the real application.
Other observations:
- I included the
-gpu=nordc
option based on previous replies, but removing this option does not affect the results. - I tried a
#pragma acc host_data use_device(data)
block: this behaves the same way asacc_deviceptr
, returning the host address - Printing
data
from a device kernel with#pragma acc kernels present(data)
returns the correct device pointer. - If I set
NVCOMPILER_ACC_NOTIFY=31
and addacc_delete(data, data_size*sizeof(double))
then I see a message likedelete CUDA data devaddr=0x7fff9bafa000 bytes=8192 device=0 threadid=1
containing the correct device pointer.
Please let me know if I can provide any other information.