Dynamically loading an OpenACC-enabled shared library from an executable compiled with nvc++ does not work

Add “-acc” to the nvc++ link and then it will work as expected.

This issue is that when linked with nvc++, the runtime gets initialized without OpenACC unless “-acc” is added. Since the compiler runtime is shared with the same runtime as used by the shared library, when it gets to the OpenACC section, the OpenACC code fails. With g++, the runtime initialization is delayed until the shared object is loaded.

% setenv NV_ACC_TIME=1
% nvc++ -acc -cuda -gpu=lineinfo,nordc -fPIC -o minimal_directives.o -c minimal_directives.cpp
% nvc++ -acc -cuda -gpu=lineinfo,nordc -fPIC -shared -o libshared.so minimal_directives.o
% nvc++ -o main minimal_main.cpp
% ./main
Current file:     /local/home/mcolgrove/minimal_directives.cpp
        function: launch
        line:     4
This file was compiled: -acc=gpu -gpu=cc80
% nvc++ -o main minimal_main.cpp -acc
% ./main

Accelerator Kernel Timing data
/local/home/mcolgrove/minimal_directives.cpp
  launch  NVIDIA  devicenum=0
    time(us): 56
    4: compute region reached 1 time
        4: kernel launched 1 time
            grid: [1]  block: [1]
             device time(us): total=5 max=5 min=5 avg=5
            elapsed time(us): total=308 max=308 min=308 avg=308
    4: data region reached 2 times
        4: data copyin transfers: 1
             device time(us): total=8 max=8 min=8 avg=8
        6: data copyout transfers: 1
             device time(us): total=43 max=43 min=43 avg=43