elementary problem with linker - undefined references

Up to now I’ve been experimenting with the Accelerator using single file programs. Today I tried to build a life-sized program with the PGI tools (not an Accelerator program) and I’m having a problem with getting the linker to see a library. I feel this problem is of the most elementary nature, but I don’t see what I’m doing wrong.

All my files compile without errors and then the linkage step starts with:

pgfortran main.o climate.o vegetation.o biogeochem.o initial.o humidity.o \
        radiation.o readpars.o canopy.o physiology.o snow.o soil.o utilities.o \
        ctemfire.o disturbance.o distmats.o io-cfs.o ies-io.o math.o stats.o trapfpe.o \
        -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -fastsse -Mfixed -Mextend -O3 -I/data/capa1/netcdf-4.1.1/centos64-pgi/include -L/data/capa1/netcdf-4.1.1/centos64-pgi/lib -o ibis

and I’m getting “undefined reference” errors for functions in the netCDF library that’s in the directory indicated by the -L flag, for example:

main.o: In function `main':
main.f:639: undefined reference to `nf_open_'
main.f:678: undefined reference to `nf_close_'

Checking to see that netCDF library has the required symbols:

s-edm-mahler:/data/capa1/netcdf-4.1.1/centos64-pgi/lib> nm -g libnetcdf.a | grep nf_open_
                 U nf_open_
00000000000001e0 T nf_open_

So the symbols are in the library but the linker doesn’t find them. What am I doing wrong?

===
Note: I feel this is a simple problem with getting the linker to look in the right place (is there a way to ask it where it’s looking?), rather than a problem with the build tools or the library, but here are a few more details about that in case it might be relevant.

The netCDF library was built guided by the instructions at http://www.pgroup.com/resources/netcdf/netcdf40_pgi2010.htm (except I discovered towards the end of that not that I should have built it with -fastsse and will have to build it again, but that wouldn’t explain the undefined symbols).

I have the “switch -bind_at_load is replace(-bind_at_load) positional(linker);” line in my siterc file (as recommended in the above document) although I don’t think that’s immediately relevant.

I’m using pgfortran 10.2-1 64-bit target on x86-64 Linux -tp nehalem-64.

===

Thank you for any suggestions you might have towards resolving this problem.

Regards,
Neil.

Further to the above, I tried changing -L to -l to try to be more specific. This was the rather strange result:

pgfortran main.o climate.o vegetation.o biogeochem.o initial.o humidity.o \
        radiation.o readpars.o canopy.o physiology.o snow.o soil.o utilities.o \
        ctemfire.o disturbance.o distmats.o io-cfs.o ies-io.o math.o stats.o trapfpe.o \
        -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -fastsse -Mfixed -Mextend -O3 -I/data/capa1/netcdf-4.1.1/centos64-pgi/include -l/data/capa1/netcdf-4.1.1/centos64-pgi/lib/libnetcdf.a -o ibis
/usr/bin/ld: cannot find -l/data/capa1/netcdf-4.1.1/centos64-pgi/lib/libnetcdf.a
make: *** [ibis] Error 2

$ ls /data/capa1/netcdf-4.1.1/centos64-pgi/lib/libnetcdf.a
/data/capa1/netcdf-4.1.1/centos64-pgi/lib/libnetcdf.a

My first question is, should I be getting the gnu linker? Isn’t there a linker in with the PGI build tools?

My second question is why can’t the linker see the library when ls can see it fine from the same working directory?

Hi njackson,

My first question is, should I be getting the gnu linker? Isn’t there a linker in with the PGI build tools?

We use the system’s linker (ld on Linux/MacOSX, link on Windows) and do not ship our own.

My second question is why can’t the linker see the library when ls can see it fine from the same working directory?

The ld translates “-lname” to be “libname.a” or “libname.so” located in search path. The search path being defined using the “-L” flag. So the correct flags to use are “-L/data/capa1/netcdf-4.1.1/centos64-pgi/lib -lnetcdf”.

Hope this helps,
Mat

Thanks Mat,

After I realised that I was dealing with the gnu linker, I spent a little time with it’s documentation and found out the two key pieces of information that you also pointed out below.

(It seems to me a bit of an odd approach, to refer to a library by one name and then put it in a file with a somewhat different name. I guess I’m from a world where abc.lib is stored in a file called abc.lib not from a world where abc.lib would be stored in a file called libabc.lib.a!)

My program now builds perfectly. (And produces substantially different numerical results from the 32-bit version built with g77. But that is another matter, which I am now looking into.)

It seems to me a bit of an odd approach, to refer to a library by one name and then put it in a file with a somewhat different name. I guess I’m from a world where abc.lib is stored in a file called abc.lib not from a world where abc.lib would be stored in a file called libabc.lib.a!

Just one of the many differences between Windows and Linux.

And produces substantially different numerical results from the 32-bit version built with g77. But that is another matter, which I am now looking into

Instead of compiling with “-fastsse -O3” try the following flag combination.

“-O2 -tp piii”
“-O2 -tp piii -pc 64”
“-O0 -Kieee”
“-fast -Kieee”
“-fastsse -Kieee”

How do the results compare at each of these opt levels? Also what flags are you using for g77?

My best guess it’s a 80-bit x87 versus 64-bit SSE issue, but running the above flags will tell us. (For details, please see the top two items in http://www.pgroup.com/support/execute.htm)

  • Mat

The machine on which I’m working on getting a good pgfortran build is a Centos machine with two Intel Xeon 5500 Series processors (and three TESLA C1060 cards which we’ll be using later once the program runs well).

Currently the program runs from a g77 or gfortran build on 32-bit openSuse and Centos desktops and servers with Intel Pentium familly processors.

It runs on this new machine from a g77 or gfortran build (and gets identical results to those it gets on the other machines), when I build it with these flags:

-ff2c -g -Wall -Wno-unused-variable -Wno-unused-labels -march=pentium4 -mfpmath=sse -malign-double -m32 -ffixed-line-length-132 -static -O3

(I was building in 32-bit because I don’t yet have a good build of 64-bit netCDF libraries for these compilers (and the 32-bit libraries are built to the f2c conventions) and also because I wanted initially to use the same makefile for all the machines that run the program.)

===
After my last post I discovered that simply removing the -fastsse flag “fixed” the problem of the widely different results.

So

-Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse -O3

gave the apparently wrong results, whereas

-Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -O3

gives results very close to those I see from 32-bit g77- and gfortran-built programs.

===
After reading your post, Mat, I did runs with the several different combinations of swithces as follows:

“-O2 -tp piii”: Build fails
“-O2 -tp piii -pc 64”: Build fails
“-O0 -Kieee”: “Good” results (These are listed as “pgi 3” in the results below)
“-fast -Kieee”: “Very good” results (These are listed as “pgi 5” below)
“-fastsse -Kieee”: Listed below as “pgi 4”; identical results to those with -fast rather than -fastsse.

(In the first two cases the build fails with the error message:

“PGC-F-0155-built-in __m128, __m128d, __m128i data types require compilation for 64-bit architectures or 32-bit architectures that support SSE1 and SSE2 instructions.”

And this is fine, because we won’t be wanting to build a 32-bit version on this machine anyway.)

===
The results for 26 key variables are shown below (numerical values and as a percentage difference from the gfortran results (chosen arbitrarilly as the refernce). (Some of these values are derived from multistep calculations performed hourly for about a hundred years, so it is expected that a certain amount of error will accumulate.)

The column headings refer to the following sets of compiler flags:

gfortran = -ff2c -g -Wall -Wno-unused-variable -Wno-unused-labels -march=pentium4 -mfpmath=sse -malign-double -m32 -ffixed-line-length-132 -static -O3
g77 = -ff2c -g -Wall -Wno-unused-variable -Wno-unused-labels -march=pentium4 -mfpmath=sse -malign-double -m32 -ffixed-line-length-132 -static -O3
pgi 1 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend
pgi 2 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -O3
pgi 3 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -O0 -Kieee
pgi 4 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse -Kieee
pgi 5 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fast -Kieee
pgi 6 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse
pgi 7 = -Mdefaultunit -gopt -Minfo=ccff -Mdclchk -Mfixed -Mextend -fastsse -O3

Numerical Results for 26 key variables:

 gfortran     g77          pgi 1        pgi 2        pgi 3        pgi 4 & 5    pgi 6 & 7
 267365.3901  267262.2062  267664.7928  267588.175   267747.227   267230.7771  138874.5163
 960726.722   960702.466   960951.003   960860.212   960992.998   960737.081   846207.571
1833429.116  1833432.56   1833732.699  1833604.846  1833753.523  1833489.6    1615360.837
   8555.657     8557.483     8559.163     8558.46      8556.856     8556.383     5944.643
1322040.796  1322406.536  1321578.572  1321947.493  1321123.411  1322285.269   998562.271
 521652.51    521839.781   521527.755   521533.469   521474.133   521951.733   684769.25
3387607.073   387741.123  3388039.897  3387771.44   3387895.981  3387741.382  4096289.429
 769880.265   769967.972   769761.725   769759.905   769704.059   770041.655   820051.104
   7705.6621    7707.02628   7704.30263   7705.4813    7702.58011   7706.5819    6333.14295
   3267.52306   3269.05545   3263.87805   3263.77097   3263.80128   3267.31957   4925.84576
  34584.72     34584.692    34584.839    34584.773    34584.85     34584.691    38973.571
      6.178        6.178        6.178        6.178        6.178        6.178        5.695
      2.325        2.325        2.327        2.326        2.327        2.326        2.149
   1233.74      1233.74      1233.74      1233.74      1233.74      1233.74      1233.74
    229.059      229.069      229.082      229.071      229.077      229.076      216.205
    107.817      107.822      107.821      107.814      107.819      107.814      101.169
   1107.247     1107.244     1107.253     1107.256     1107.254     1107.251       15.689
     79.503       79.503       79.501       79.504       79.502       79.5         10.741
   1027.744     1027.741     1027.751     1027.752     1027.752     1027.751        4.948
      4.874        4.873        4.874        4.875        4.875        4.874      -21.402
   1824.79      1824.79      1824.791     1824.791     1824.791     1824.79       836.677
    228.517      228.517      228.517      228.517      228.517      228.517      151.078
      0.186        0.186        0.186        0.186        0.186        0.186        0.175
      0.897        0.897        0.897        0.897        0.897        0.897        0.013
      0.471        0.471        0.471        0.471        0.471        0.471        0.468
      0.072        0.072        0.072        0.072        0.072        0.072        0.685



Percentage discrepancy (from 32-bit gfortran results) for the 26 key variables:

gfortran  g77  pgi 1 pgi 2 pgi 3 pgi 4&5  pgi 6&7
  0.00   -0.00  0.11  0.08  0.14  -0.05    -48.06
  0.00    0.00  0.02  0.01  0.03   0.00    -11.92
  0.00    0.00  0.02  0.01  0.02   0.00    -11.89
  0.00    0.02  0.04  0.03  0.01   0.01    -30.52
  0.00    0.03 -0.03 -0.00 -0.07   0.02    -24.47
  0.00    0.04 -0.02 -0.00 -0.03   0.06     31.27
  0.00    0.00  0.01  0.00  0.01   0.00     20.92
  0.00    0.01 -0.02 -0.00 -0.02   0.02      6.52
  0.00    0.02 -0.02  0.00 -0.04   0.01    -17.81
  0.00    0.05 -0.11 -0.10 -0.11  -0.01     50.75
  0.00    0.00  0.00  0.00  0.00   0.00     12.69
  0.00    0.00  0.00  0.00  0.00   0.00     -7.82
  0.00    0.00  0.09  0.04  0.09   0.04     -7.57
  0.00    0.00  0.00  0.00  0.00   0.00      0.00
  0.00    0.00  0.01  0.01  0.01   0.01     -5.61
  0.00    0.00  0.00  0.00  0.00   0.00     -6.17
  0.00    0.00  0.00  0.00  0.00   0.00    -98.58
  0.00    0.00  0.00  0.00  0.00   0.00    -86.49
  0.00    0.00  0.00  0.00  0.00   0.00    -99.52
  0.00   -0.00  0.00  0.02  0.02   0.00   -539.11
  0.00    0.00  0.00  0.00  0.00   0.00    -54.15
  0.00    0.00  0.00  0.00  0.00   0.00    -33.89
  0.00    0.00  0.00  0.00  0.00   0.00     -5.91
  0.00    0.00  0.00  0.00  0.00   0.00    -98.55
  0.00    0.00  0.00  0.00  0.00   0.00     -0.64
  0.00    0.00  0.00  0.00  0.00   0.00    851.39

===
I’m slightly confused by the huge discrepancy in the results for the runs pgi 6 & 7. Does this represent a bug, or could such a huge difference result from legitimately different floating point implementations?

Thanks.

I’m slightly confused by the huge discrepancy in the results for the runs pgi 6 & 7. Does this represent a bug, or could such a huge difference result from legitimately different floating point implementations?

It could be a bug but more likely due to the use of faster but less precise intrinsics or vectorization. Do you make heavy use sin, cos, exp, log, etc.? Are you using single or double precision values?

Let’s try a few more flag sets to see if it’s the fast math routines (which are disabled with -Kieee) or vectorization.

“-fast” ! Includes the fast math routines but not vectorization
“-fast -Mvect=sse” ! Add in vectorization
“-fast -Mflushz” ! Enable Flush-To-Zero Mode for denormals
“-fastsse -r8” ! set the REAL default kind to double precision

If you can determine that it is Vectorization or the intrinsics, you may want to narrow it down to the routine and/or loop that’s causing the divergence. GPU’s are basically a very large vector processor, so any numerical issues caused by vectorization will be amplified on a GPU. Also, a GPU’s trigonometric functions can be less precise (~3ulps) versus PGI’s fast math routines which are no more than 1ulps off.

  • Mat

Thank you Mat,

I’ve only been working with the code (about 40 000 lines of it) for a few weeks and haven’t looked into the guts of the science calculations much, but there are definitely logs and exponentials and some trig functions.

The profiler reports that (with -fastsse -Kieee) 13% of the execution time is spent in __mth_i_exp, 8% in __mth_i_dexp2, 7% in __mth_i_dlog2, and 3% in __mth_i_rpowr.

We’re using a mixture of single and double precision variables declared as real4 and real8, and some older single precision variables declared as real.

We’re using the real*8 variables where a previous analysis suggested we were losing too much precision to rounding during the calculations. We don’t need the precision in the final results; measured values of the same variables could expect to be good to not much better than three or four significant figures, if that.

Testing out the four flag settings you suggested, the first three ("-fast" , “-fast -Mvect=sse”, and “-fast -Mflushz”) give identical results to the “wrong” results I had before with “-fastsse” and “-fastsse -O3”.

Running with -r8 is not going to be possible for this program. At least not any time in the near future. This is because it has some horrible procedures for initialising and for copying arrays which are completely not typesafe. The program was clearly written with the assumption that a real was always going to be four bytes. So if the variables declared real become real*8 the program is going to segmentation fault almost before it starts running! (And it duly did so when I tried it.)

It might run with -r4. If so, that might provide some information and I’m going to try that next.

I’m not clear though what Page 235 of the PGI Users’ Guide is trying to say about -r4 and -r8. Do these flags only affect variables ambiguously declared as “real”, or does -r8 force variables declared "real4" to be real8; does -r4 force variables declared "real8" to be real4?

Thanks and best regards,
Neil.

Hi Neil,

I’m not clear though what Page 235 of the PGI Users’ Guide is trying to say about -r4 and -r8. Do these flags only affect variables ambiguously declared as “real”, or does -r8 force variables declared "real4" to be real8; does -r4 force variables declared "real8" to be real4?

The flags only effect the default kind, with “-r4” being the default. In other words, “-r8” changes the kind for “REAL” as well as constants and the intrinsics. It does not change variables with explicitly declared kinds (i.e. REAL4, REAL8, etc).

  • Mat