Linking Error in OpenACC Code

I am getting a link-time error when I try to build an OpenACC code (that is mostly Fortran, but does have some C and C++ code linked in). What’s especially odd to me is that I only get this error on my new machine, not on another machine using the same version of PGI (16.9).

I just setup a new system that has a GPU with Pascal architecture (GeForce GTX 1060). I installed the CUDA 8.0 toolkit and then installed the PGI Accelerator Fortran/C/C++ Workstation for Linux on my Fedora 24 system. Due to the fact that my default gcc is so new, I had to build gcc 5.3.1 and told PGI’s C++ frontend to use this.

My GPU details are:

$ pgaccelinfo 

CUDA Driver Version:           8000
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  367.44  Wed Aug 17 22:24:07 PDT 2016

Device Number:                 0
Device Name:                   GeForce GTX 1060 6GB
Device Revision Number:        6.1
Global Memory Size:            6366756864
Number of Multiprocessors:     10
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1847 MHz
Execution Timeout:             Yes
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             4104 MHz
Memory Bus Width:              192 bits
L2 Cache Size:                 1572864 bytes
Max Threads Per SMP:           2048
Async Engines:                 2
Unified Addressing:            Yes
Managed Memory:                Yes
PGI Compiler Option:           -ta=tesla:cc60

My siterc file looks like:

$ cat /opt/pgi/linux86-64/16.9/bin/localrc.xrb 
set LFC=-lgfortran;
set LDSO=/lib64/ld-linux-x86-64.so.2;
set GCCDIR=/usr/lib/gcc/x86_64-redhat-linux/6.2.1;
set GPPDIR32= /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/../../../../include/c++/5.3.1 /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/../../../../include/c++/5.3.1/x86_64-unknown-linux-gnu/. /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/../../../../include/c++/5.3.1/backward /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/include /usr/local/include /opt/gcc/gcc-5.3.1/include /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/include-fixed /usr/include In file included from /usr/include/features.h:392:0, from /usr/include/stdio.h:27, /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory;
set GPPDIR64= /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/../../../../include/c++/5.3.1 /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/../../../../include/c++/5.3.1/x86_64-unknown-linux-gnu /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/../../../../include/c++/5.3.1/backward /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/include /usr/local/include /opt/gcc/gcc-5.3.1/include /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/include-fixed /usr/include;
set GCCINC32= /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include /usr/local/include /usr/include In file included from /usr/include/features.h:392:0, from /usr/include/stdio.h:27, /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory # include <gnu/stubs-32.h>;
set GCCINC64= /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include /usr/local/include /usr/include;
set G77DIR=/usr/lib/gcc/x86_64-redhat-linux/6.2.1/;
set OEM_INFO=64-bit target on x86-64 Linux $INFOTPVAL;
set GCCINC= /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include /usr/local/include /usr/include;
set GPPDIR= /opt/gcc/gcc-5.3.1/include/c++/5.3.1 /opt/gcc/gcc-5.3.1/include/c++/5.3.1/x86_64-unknown-linux-gnu /opt/gcc/gcc-5.3.1/include/c++/5.3.1/backward /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/include /usr/local/include /opt/gcc/gcc-5.3.1/include /opt/gcc/gcc-5.3.1/lib/gcc/x86_64-unknown-linux-gnu/5.3.1/include-fixed /usr/include;
set LOCALRC=YES;
set THROW=__THROW=;
set EXTENSION=__extension__=;
set COMPGCCINCDIR=include-gcc50;
set LC=$if(-Bstatic,-lgcc -lgcc_eh -lc -lgcc -lgcc_eh -lc, -lgcc -lc -lgcc -lgcc_s);
# GLIBC version 2.23
# GCC version 6.2.1
set GCCVERSION=60201;
set LOCALDEFS=__STDC_HOSTED__;
export PGI=$COMPBASE;
set DEFCUDAVERSION=8.0;
# makelocalrc executed by root Tue Oct 4 16:23:40

I’ve just recently set this system up and am only trying to build one of our basic OpenACC codes to verify the install. I can build and run this code on other machines with PGI 16.9 and Fedora 24. The only difference I can think of is possibly the CUDA version and the fact that they have Maxwells (GTX 960).

When I build the code, the link line and subsequent errors are

pgf95  -module t/Linux.PGI.debug.acc/m -It/Linux.PGI.debug.acc/m -acc -Minfo=acc -g   -pgc++libs   -o testburn.Linux.PGI.debug.acc.exe testburn.f90 t/Linux.PGI.debug.acc/o/BLProfiler_f90.o t/Linux.PGI.debug.acc/o/actual_burner.o t/Linux.PGI.debug.acc/o/actual_network.o t/Linux.PGI.debug.acc/o/actual_rhs.o t/Linux.PGI.debug.acc/o/backtrace_f.o t/Linux.PGI.debug.acc/o/bc.o t/Linux.PGI.debug.acc/o/bc_functions.o t/Linux.PGI.debug.acc/o/bl_IO.o t/Linux.PGI.debug.acc/o/bl_constants.o t/Linux.PGI.debug.acc/o/bl_error.o t/Linux.PGI.debug.acc/o/bl_mem_stat.o t/Linux.PGI.debug.acc/o/bl_parmparse.o t/Linux.PGI.debug.acc/o/bl_prof_stubs.o t/Linux.PGI.debug.acc/o/bl_space.o t/Linux.PGI.debug.acc/o/bl_stream.o t/Linux.PGI.debug.acc/o/bl_string.o t/Linux.PGI.debug.acc/o/bl_system.o t/Linux.PGI.debug.acc/o/bl_timer.o t/Linux.PGI.debug.acc/o/bl_types.o t/Linux.PGI.debug.acc/o/bndry_reg.o t/Linux.PGI.debug.acc/o/box_f.o t/Linux.PGI.debug.acc/o/box_util.o t/Linux.PGI.debug.acc/o/boxarray_f.o t/Linux.PGI.debug.acc/o/boxlib_f.o t/Linux.PGI.debug.acc/o/build_info.o t/Linux.PGI.debug.acc/o/cc_restriction.o t/Linux.PGI.debug.acc/o/cluster_f.o t/Linux.PGI.debug.acc/o/constants_cgs.o t/Linux.PGI.debug.acc/o/create_umac_grown.o t/Linux.PGI.debug.acc/o/cutcells.o t/Linux.PGI.debug.acc/o/define_bc_tower.o t/Linux.PGI.debug.acc/o/eos_type.o t/Linux.PGI.debug.acc/o/f2kcli.o t/Linux.PGI.debug.acc/o/fab.o t/Linux.PGI.debug.acc/o/fabio.o t/Linux.PGI.debug.acc/o/filler.o t/Linux.PGI.debug.acc/o/fillpatch.o t/Linux.PGI.debug.acc/o/fourth_order_interp_coeffs.o t/Linux.PGI.debug.acc/o/integration_data.o t/Linux.PGI.debug.acc/o/integrator.o t/Linux.PGI.debug.acc/o/interp.o t/Linux.PGI.debug.acc/o/knapsack.o t/Linux.PGI.debug.acc/o/layout.o t/Linux.PGI.debug.acc/o/list_box.o t/Linux.PGI.debug.acc/o/make_new_grids.o t/Linux.PGI.debug.acc/o/mempool_f.o t/Linux.PGI.debug.acc/o/microphysics.o t/Linux.PGI.debug.acc/o/ml_boxarray.o t/Linux.PGI.debug.acc/o/ml_cc_restriction.o t/Linux.PGI.debug.acc/o/ml_layout.o t/Linux.PGI.debug.acc/o/ml_multifab.o t/Linux.PGI.debug.acc/o/ml_nd_restriction.o t/Linux.PGI.debug.acc/o/ml_restrict_fill.o t/Linux.PGI.debug.acc/o/multifab_f.o t/Linux.PGI.debug.acc/o/multifab_fill_ghost_cells.o t/Linux.PGI.debug.acc/o/multifab_physbc.o t/Linux.PGI.debug.acc/o/multifab_physbc_edgevel.o t/Linux.PGI.debug.acc/o/network.o t/Linux.PGI.debug.acc/o/nodal_neumann_bcs.o t/Linux.PGI.debug.acc/o/nodal_restriction.o t/Linux.PGI.debug.acc/o/nodal_stencil_bc.o t/Linux.PGI.debug.acc/o/numerical_jacobian.o t/Linux.PGI.debug.acc/o/omp_stubs.o t/Linux.PGI.debug.acc/o/parallel_stubs.o t/Linux.PGI.debug.acc/o/plotfile.o t/Linux.PGI.debug.acc/o/ppm_util.o t/Linux.PGI.debug.acc/o/probin.o t/Linux.PGI.debug.acc/o/rate_type.o t/Linux.PGI.debug.acc/o/regrid.o t/Linux.PGI.debug.acc/o/screen.o t/Linux.PGI.debug.acc/o/sort_box.o t/Linux.PGI.debug.acc/o/sort_d.o t/Linux.PGI.debug.acc/o/sort_i.o t/Linux.PGI.debug.acc/o/tag_boxes.o t/Linux.PGI.debug.acc/o/temperature_integration.o t/Linux.PGI.debug.acc/o/vector_i.o t/Linux.PGI.debug.acc/o/write_job_info.o t/Linux.PGI.debug.acc/o/actual_eos.o t/Linux.PGI.debug.acc/o/actual_integrator.o t/Linux.PGI.debug.acc/o/bs_jac.o t/Linux.PGI.debug.acc/o/bs_rhs.o t/Linux.PGI.debug.acc/o/bs_type.o t/Linux.PGI.debug.acc/o/burn_type.o t/Linux.PGI.debug.acc/o/eos.o t/Linux.PGI.debug.acc/o/microphysics_math.o t/Linux.PGI.debug.acc/o/rpar.o t/Linux.PGI.debug.acc/o/stiff_ode.o t/Linux.PGI.debug.acc/o/daxpy.o t/Linux.PGI.debug.acc/o/dcopy.o t/Linux.PGI.debug.acc/o/ddot.o t/Linux.PGI.debug.acc/o/dgbfa.o t/Linux.PGI.debug.acc/o/dgbsl.o t/Linux.PGI.debug.acc/o/dgefa.o t/Linux.PGI.debug.acc/o/dgemm.o t/Linux.PGI.debug.acc/o/dgesl.o t/Linux.PGI.debug.acc/o/dscal.o t/Linux.PGI.debug.acc/o/idamax.o t/Linux.PGI.debug.acc/o/vddot.o t/Linux.PGI.debug.acc/o/fabio_c.o t/Linux.PGI.debug.acc/o/ppm_util_c.o t/Linux.PGI.debug.acc/o/system_util_c.o t/Linux.PGI.debug.acc/o/timer_c.o t/Linux.PGI.debug.acc/o/Arena.o t/Linux.PGI.debug.acc/o/CArena.o t/Linux.PGI.debug.acc/o/MemPool.o t/Linux.PGI.debug.acc/o/backtrace_c.o    
testburn.f90:
nvlink error   : Undefined reference to '_eos_type_module_24' in 'testburn.o'
nvlink error   : Undefined reference to '_actual_network_16' in 'testburn.o'
nvlink error   : Undefined reference to '_actual_eos_module_24' in 'testburn.o'
nvlink error   : Undefined reference to '_actual_eos_module_16' in 'testburn.o'
nvlink error   : Undefined reference to '_screening_module_24' in 't/Linux.PGI.debug.acc/o/actual_rhs.o'
nvlink error   : Undefined reference to '_screening_module_16' in 't/Linux.PGI.debug.acc/o/actual_rhs.o'
nvlink error   : Undefined reference to '_extern_probin_module_24' in 't/Linux.PGI.debug.acc/o/actual_rhs.o'
nvlink error   : Undefined reference to '_bs_type_module_16' in 't/Linux.PGI.debug.acc/o/integrator.o'
pgacclnk: child process exit status 2: /opt/pgi/linux86-64/16.9/bin/pgnvd
/home/ajacobs/Codebase/BoxLib/Tools/F_mk/GMakerules.mak:25: recipe for target 'testburn.Linux.PGI.debug.acc.exe' failed
make: *** [testburn.Linux.PGI.debug.acc.exe] Error 2

One odd aspect of this is that my build commands never ask testburn.o to be made. If I build with GNU, this object file is never made. For some reason PGI creates this object file in the working directory. But this may not matter, as it’s made on the other machine where the code successfully builds and runs.

Any ideas on what the problem is? Has 16.9 been tested with CUDA 8.0 and a Pascal GPU? Much thanks!

It seems a workaround is to keep re-ordering my object files (treating testburn.f90 like it’s testburn.o) until the linker is satisfied. However, this seems like a compiler bug. I wouldn’t think the order should matter, and it doesn’t for GNU compilers or the same PGI compiler on a different machine.

It’s a problem with NVLink, not the compiler so unfortunately not something we here at PGI can fix. Though, I’ll send a note to the NVLink folk and see what can be done.

  • Mat

Thanks Mat! If you hear anything back, please let me know. I’ve developed a branch of our code that can build with object files ordered by dependency on the link line, but I wouldn’t want to put this in the production version of our code (unless it is an explicit requirement of PGI 16.9+ or nvlink that object files must be ordered by dependency – I don’t think this is the case). Once this nvlink bug is fixed, I can go back to using our original build infrastructure.

Also, I’ll note a colleague of mine was able to reproduce the error on another machine with a pascal GPU (not sure if that matters or if it just matters that you have the CUDA 8.0 driver) and CUDA 8.0 (8.0.44 to be specific) compiler/linker using PGI 16.9.

@AMJacobs - I again ran into exactly the same problem, trying to apply CUDA 8 to a project with hundreds of fortran sources (a Japanese weather model). Would you be able to share your solution?

Quick update: I’ve put together a quick fix for this issue, using the output of the following script, run without arguments in a flat source folder, will produce a list of object files in order of their dependency:

It is based on

and

http://code.activestate.com/lists/python-list/407341/