Errors Linking with PGI 16.9

All,

Today I tried to move my PGI from 16.5 to 16.9. On the CPU, all is well, but when I compile with GPU, a link step is throwing:

mpif90 -L/discover/swdev/mathomp4/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -L/discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib  -pgc++libs -tp=px-64 -o Chem_Aod.x Chem_Aod.o libChem_Base.a /discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib/libMAPL_Base.a /discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib/libMAPL_Base_stubs.a  /discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib/libGMAO_eu.a /discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib/libMAPL_cfio_r4.a /discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib/libGMAO_gfio_r4.a /discover/swdev/mathomp4/Models/Heracles-UNSTABLE-PGI16/GEOSagcm/Linux/lib/libGMAO_mpeu.a  -L/discover/swdev/mathomp4/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -lnetcdff -lnetcdf -L/discover/swdev/mathomp4/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -L/discover/swdev/mathomp4/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -lnetcdf -lhdf5_hl -lhdf5 -lmfhdf -ldf -lsz -ljpeg -lgpfs -lcurl -lssl -lcrypto -lz -lrt -ldl -lm -L/discover/swdev/mathomp4/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -lcurl -lssl -lcrypto -lssl -lcrypto -ldl -lz -lz -lrt -lm /discover/swdev/mathomp4/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib/libesmf.a -L/usr/local/other/SLES11.3/openmpi/2.0.1/pgi-16.9.0/lib -I/usr/local/other/SLES11.3/openmpi/2.0.1/pgi-16.9.0/lib -L/usr/local/other/SLES11.3/openmpi/2.0.1/pgi-16.9.0/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -L/usr/local/other/SLES11.3/openmpi/2.0.1/pgi-16.9.0/lib -lmpi -ldl -lrt -Mcuda=nofma,ptxinfo,7.0,cc35,maxregcount:72 -acc -ta=nvidia:wait,nofma,7.0,cc35,maxregcount:72 -Minfo=accel,ccff
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:1:36: error: redefinition of ‘__cudaRegisterLinkedBinary_17m_StrTemplate_F90’
 #define __REGISTERFUNCNAME_CORE(X) __cudaRegisterLinkedBinary##X
                                    ^
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:2:31: note: in expansion of macro ‘__REGISTERFUNCNAME_CORE’
 #define __REGISTERFUNCNAME(X) __REGISTERFUNCNAME_CORE(X)
                               ^~~~~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:7:6: note: in expansion of macro ‘__REGISTERFUNCNAME’
 void __REGISTERFUNCNAME(id) (void (*callback_fp)(void **), void *prelinked_fatbinc, void *foo, void (*dummy_ref)(void *)) \
      ^~~~~~~~~~~~~~~~~~
/tmp/pgcudaWKFeqNcRW1h1.reg.c:8:1: note: in expansion of macro ‘DEFINE_REGISTER_FUNC’
 DEFINE_REGISTER_FUNC(_17m_StrTemplate_F90)
 ^~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:1:36: note: previous definition of ‘__cudaRegisterLinkedBinary_17m_StrTemplate_F90’ was here
 #define __REGISTERFUNCNAME_CORE(X) __cudaRegisterLinkedBinary##X
                                    ^
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:2:31: note: in expansion of macro ‘__REGISTERFUNCNAME_CORE’
 #define __REGISTERFUNCNAME(X) __REGISTERFUNCNAME_CORE(X)
                               ^~~~~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:7:6: note: in expansion of macro ‘__REGISTERFUNCNAME’
 void __REGISTERFUNCNAME(id) (void (*callback_fp)(void **), void *prelinked_fatbinc, void *foo, void (*dummy_ref)(void *)) \
      ^~~~~~~~~~~~~~~~~~
/tmp/pgcudaWKFeqNcRW1h1.reg.c:5:1: note: in expansion of macro ‘DEFINE_REGISTER_FUNC’
 DEFINE_REGISTER_FUNC(_17m_StrTemplate_F90)
 ^~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:1:36: error: redefinition of ‘__cudaRegisterLinkedBinary_10m_zeit_F90’
 #define __REGISTERFUNCNAME_CORE(X) __cudaRegisterLinkedBinary##X
                                    ^
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:2:31: note: in expansion of macro ‘__REGISTERFUNCNAME_CORE’
 #define __REGISTERFUNCNAME(X) __REGISTERFUNCNAME_CORE(X)
                               ^~~~~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:7:6: note: in expansion of macro ‘__REGISTERFUNCNAME’
 void __REGISTERFUNCNAME(id) (void (*callback_fp)(void **), void *prelinked_fatbinc, void *foo, void (*dummy_ref)(void *)) \
      ^~~~~~~~~~~~~~~~~~
/tmp/pgcudaWKFeqNcRW1h1.reg.c:9:1: note: in expansion of macro ‘DEFINE_REGISTER_FUNC’
 DEFINE_REGISTER_FUNC(_10m_zeit_F90)
 ^~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:1:36: note: previous definition of ‘__cudaRegisterLinkedBinary_10m_zeit_F90’ was here
 #define __REGISTERFUNCNAME_CORE(X) __cudaRegisterLinkedBinary##X
                                    ^
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:2:31: note: in expansion of macro ‘__REGISTERFUNCNAME_CORE’
 #define __REGISTERFUNCNAME(X) __REGISTERFUNCNAME_CORE(X)
                               ^~~~~~~~~~~~~~~~~~~~~~~
/usr/local/sles11/pgi/linux86-64/16.9/include_acc/linkstub70.c:7:6: note: in expansion of macro ‘__REGISTERFUNCNAME’
 void __REGISTERFUNCNAME(id) (void (*callback_fp)(void **), void *prelinked_fatbinc, void *foo, void (*dummy_ref)(void *)) \
      ^~~~~~~~~~~~~~~~~~
/tmp/pgcudaWKFeqNcRW1h1.reg.c:6:1: note: in expansion of macro ‘DEFINE_REGISTER_FUNC’
 DEFINE_REGISTER_FUNC(_10m_zeit_F90)
 ^~~~~~~~~~~~~~~~~~~~
pgacclnk: child process exit status 2: /usr/local/sles11/pgi/linux86-64/16.9/bin/pgnvd
make: *** [Chem_Aod.x] Error 2

Note the same options worked fine with PGI 16.5, so maybe a flag has changed behavior? Any ideas?

Matt

Note, on a different machine (different OS, etc.) I get:

mpif90 -L/ford1/share/gmao_SIteam/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -L/home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib  -pgc++libs -tp=px-64 -o Chem_Aod.x Chem_Aod.o libChem_Base.a /home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib/libMAPL_Base.a /home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib/libMAPL_Base_stubs.a  /home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib/libGMAO_eu.a /home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib/libMAPL_cfio_r4.a /home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib/libGMAO_gfio_r4.a /home/mathomp4/Models/Heracles-UNSTABLE-PGI169/GEOSagcm/Linux/lib/libGMAO_mpeu.a  -L/ford1/share/gmao_SIteam/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -lnetcdff -lnetcdf -L/ford1/share/gmao_SIteam/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -L/ford1/share/gmao_SIteam/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -lnetcdf -lhdf5_hl -lhdf5 -lmfhdf -ldf -lsz -ljpeg -lcurl -lssl -lcrypto -lz -lrt -ldl -lm -L/ford1/share/gmao_SIteam/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib -lcurl -lssl -lcrypto -lssl -lcrypto -lz -lrt -lm /ford1/share/gmao_SIteam/Baselibs/GMAO-Baselibs-4_0_8/x86_64-unknown-linux-gnu/pgfortran_16.9-openmpi_2.0.1/Linux/lib/libesmf.a -L/ford1/share/gmao_SIteam/MPI/openmpi_2.0.1-pgi_16.9/lib -I/ford1/share/gmao_SIteam/MPI/openmpi_2.0.1-pgi_16.9/lib -L/ford1/share/gmao_SIteam/MPI/openmpi_2.0.1-pgi_16.9/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -L/ford1/share/gmao_SIteam/MPI/openmpi_2.0.1-pgi_16.9/lib -lmpi -ldl -lrt -Mcuda=nofma,ptxinfo,7.0,cc20,maxregcount:72 -acc -ta=nvidia:wait,nofma,7.0,cc20,maxregcount:72 -Minfo=accel,ccff
In file included from /opt/pgi/linux86-64/16.9/include_acc/linkstub70.c:10:
/tmp/pgcudam6XbKIbzSNe0.reg.c:8: error: redefinition of ‘__cudaRegisterLinkedBinary_17m_StrTemplate_F90’
/tmp/pgcudam6XbKIbzSNe0.reg.c:5: note: previous definition of ‘__cudaRegisterLinkedBinary_17m_StrTemplate_F90’ was here
/tmp/pgcudam6XbKIbzSNe0.reg.c:9: error: redefinition of ‘__cudaRegisterLinkedBinary_10m_zeit_F90’
/tmp/pgcudam6XbKIbzSNe0.reg.c:6: note: previous definition of ‘__cudaRegisterLinkedBinary_10m_zeit_F90’ was here
pgacclnk: child process exit status 2: /opt/pgi/linux86-64/16.9/bin/pgnvd
make: *** [Chem_Aod.x] Error 2

Answering my own question, it looks like PGI’s linker got a bit more strict and this is a fault on our end with our libraries.

I think. I’m debugging now.

ETA: No. This does seem to be a definite change in PGI’s behavior. While the one above is our fault, I’ve found more issues with this in later links. Does anyone know what changed from 16.5 to 16.9 with the linking?

One more additional piece of information: this error only occurs when compiling for GPU. If I don’t compile for GPU, everything is fine. Hmm.

Hi Matt,

This doesn’t look like a linker issue. Upon load of the binary, the runtime will register all the CUDA kernels with the CUDA driver. To do this, at link time the compiler creates a registry routine for each file. For some reason, it appears that two registry routines were created for the “StrTemplate.F90” and “zeit.F90” files.

I’m not sure if this is a compiler issue or if your build happens to include the StrTemplate.o and zeit.o files twice. Though I tend towards a compiler issue given we had a similar issue with your application a few years ago. Though, TPR#20408 was fixed in 14.7 and I verified still works in 16.9.

Can you please send a reproducing example to PGI Customer Service?

  • Mat

Well, like many of my issues, the reproducer is our model. Yay. That said, I’ve gotten better at making our model “portable”. Well, “”“portable”“”. If you’d like, I could work on getting it to you and work on getting it building and even running with portable boundary conditions.

I have sort of figured out a way around this issue. In some cases, yes, a library was linked in twice. That’s a bad thing. So I fixed that. Now, in other cases, we have a pseudo-circular dependency (we’re working on removing it): lib1 needs lib2 which needs lib1, and most compilers (PGI included) handle just fine.

I figured out I could get around this with

-Wl,--whole-archive lib1 lib2 -Wl,--no-whole-archive

but, when I do that, the final, final link now throws:

nvlink error   : File uses too much global constant data (0x1c867 bytes, 0x10000 max)

So…yeah.

Matt

(Note, as I said, that CPU build does not care about the circular or double dependency at all. Only the GPU one. And GPU PGI 16.5 is fine with it too.)

Ok, so it’s probably the multiple weak references that’s causing it.

I don’t think anything has change between 16.5 and 16.9 that would cause this but will ask Michael Wolfe when he’s back in the office on Monday.

  • Mat