Failed CUDA device detection when explicitly linking libnvc

Hello,
I create a further topic on nv compilers (we are going through all the requests/issues received from our CINECA users).
When one explicitly links the libnvc library to an OpenACC code, the CUDA device detection fails and no GPU is used.
As an example, we tried the Linux_ppc64le/22.2/examples/OpenACC/samples/acc_f1.
With the current Makefile, where:

ACCFLAGS = -Minfo -acc $(OPT)

everything works fine, and exporting PGI_ACC_DEBUG=1 we read:

ACC: detected 4 CUDA devices

as expected on our 4 Volta GPUs nodes.

If we add -lnvc to the Makefile ACCFLAGS options we instead obtain:

ACC: device[1] is PGI native

and no GPU is used.
As reported by our users:
“many toolchains will automatically explicitly link against nvc themselves. I ran into this problem, because my self-compiled petsc library (using my self-compiled spack) will always include -lnvc in its automatic addition to linking and compiling to all dependent projects.”

The obvious solution is checking the Makefiles and getting rid of the explicit link, but we wonder what is going on (that is, the libnvc is always linked to the final executable; hence why explicitly linking it results in the failure of the CUDA devices?).

Thanks a lot for your assistance,
Isabella

Hi Isabella,

I suspect as to what’s happening is that the order in which libraries are included in the link can be important. In this case by explicitly adding “-lnvc”, it gets put ahead of the “acc_init” objects. This causes the initialization routines not to used, and hence no GPU is initialized.

Granted because of library dependencies order on the link line, users should probably avoid explicitly adding compiler runtime libraries to their link, but I understand it’s not uncommon, and this user may not have control over this since they are using a third-party package.

I put in an issue report, TPR #31363, and hopefully engineering can find a solution for you.

-Mat

Hi Isabella,

Engineer discussed this but see it as problematic if the compiler were to manipulate user supplied libraries (i.e."-l") since this could cause unexpected behavior.

Hence they considers this a user error and suggests that the PETsc build not include “-lnvc” (or other compiler runtime libraries) on the link.

-Mat