Add “-acc” to the nvc++ link and then it will work as expected.
This issue is that when linked with nvc++, the runtime gets initialized without OpenACC unless “-acc” is added. Since the compiler runtime is shared with the same runtime as used by the shared library, when it gets to the OpenACC section, the OpenACC code fails. With g++, the runtime initialization is delayed until the shared object is loaded.
% setenv NV_ACC_TIME=1
% nvc++ -acc -cuda -gpu=lineinfo,nordc -fPIC -o minimal_directives.o -c minimal_directives.cpp
% nvc++ -acc -cuda -gpu=lineinfo,nordc -fPIC -shared -o libshared.so minimal_directives.o
% nvc++ -o main minimal_main.cpp
% ./main
Current file: /local/home/mcolgrove/minimal_directives.cpp
function: launch
line: 4
This file was compiled: -acc=gpu -gpu=cc80
% nvc++ -o main minimal_main.cpp -acc
% ./main
Accelerator Kernel Timing data
/local/home/mcolgrove/minimal_directives.cpp
launch NVIDIA devicenum=0
time(us): 56
4: compute region reached 1 time
4: kernel launched 1 time
grid: [1] block: [1]
device time(us): total=5 max=5 min=5 avg=5
elapsed time(us): total=308 max=308 min=308 avg=308
4: data region reached 2 times
4: data copyin transfers: 1
device time(us): total=8 max=8 min=8 avg=8
6: data copyout transfers: 1
device time(us): total=43 max=43 min=43 avg=43