I am reworking an existing standalone C/C++/OpenACC CFD solver as a shared library for inclusion in a larger application. From a high level, it looks like this:
main application, compiled and linked with gcc or PGI.
– geometry, mesh, graphics, etc libraries accessed with dlopen, all compiled with gcc.
– solver library accessed with dlopen, compiled with PGI and OpenACC support.
A first test with OpenACC disabled was successful, the solver library ran as expected. However, when I compile with OpenACC enabled I encounter some isses. The solver starts up as expected and nvidia-smi shows the expected number of MPI processes assigned to the targeted device (set with CUDA_VISIBLE_DEVICES). However, the application hangs at the first acc API call (acc_get_memory() in this case).
I have seen other postings indicating that the -ta=tesla:nordc flag must be used when preparing a shared library. Is this still the recommended solution for PGI 19.X? This is unfortunately a problem for my solver, which is a big C/C++ monster with extensive use of acc routine and global variables.
A fallback would be to link in the solver statically but that would go against the larger design philosophy of this application. I also intend to try inlining but it is going to be a lot of inlined code which makes me a bit wary.
Thanks for you help,