Looks like another bug which I’ve reported as TPR#24875.
The problem is that in 17.1 we switched OpenACC from using the default CUDA stream to a different stream. This helped to remove some unneeded synchronization. However when CUDA is linked with OpenACC we have to revert back to using the default CUDA stream. This is what the call to “__pgi_uacc_set_cuda” is doing.
To get this call into the init section of the binary, we create a small assembly file during link that gets assembled and then linked in with your final executable or in this case shared object. The problem is that our compiler engineers missed the shared object case so don’t add the “@PLT” at the end of the call so it’s not PIC.
If you don’t mind a little extra work, I do have short-term work around.
First, link your code with the verbose flag, “-v”. Towards the end you’ll see a line with the “pgimport” utility. Copy this line and change the temp assembly file name to something else. Then edit this code to add “@PLT” at the end of the call to “__pgi_uacc_set_cuda”.
% /proj/pgi/linux86-64/17.9/bin/pgimport set_cuda.s -init __pgi_uacc_set_cuda
% vi set_cuda.s
% cat set_cuda.s
Next assemble this file:
/usr/local/bin/as -o set_cuda.o set_cuda.s
Now copy the link line, starts with “ld”, and replace the temp object that begins with “/tmp/pgcudafat…o” (it should be the begging of the link line) and replace it with your “set_cuda.o” object.
I’ve tested this work-around here and was able to generate a shared object.
Apologies for not having a better work around!