32 bit program halts on writing large file

Hi!

I’ve got a mixed Fortran/C/C++ (but primarily Fortran) application built with pgcc, pgCC, and pgfortran. I’ve added the -Mlfs flag to all compilers, and verified that it is passed both at compile and link time to the main programs and all “internal libraries”. However, the program fails as the (unformatted) file hits 2**31 - 1 byte in size.

Any ideas for further debugging?

Operating system is Ubuntu 14.04 LTS 64 bit; PGI version is 15.7.

Paul

Hi Paul,

Are you compiling in 32 or 64-bits? -Mlfs is only needed when compiling in 32-bits.

I’m wondering if it’s not the file but rather an internal data structure. How are you storing the data?

If it’s 64-bits, then you might need to add “-mcmodel=medium” for a static data structures, or “-Mlarge_arrays” for dynamically allocated structures.

You may also need to add “-i8” or make sure you’re using INTEGER(kind=8) data types for your indices.

  • Mat

Hi Mat!

Relevant questions. As in the title, I’m compiling the program in 32 bit mode, since we’ve got a compatibility issue with 64 bit which we haven’t figured out yet. The -tp k8-32 flag is used and the pgi32 environment module is loaded - belt and braces.

The code in question is old and has made assumptions on how much space a default integer occupies, so we can’t play around with int sizes. This cannot be the issue since the same code has run on the same input file for several years.

Our code has supported writing large files for many years. No idea why this showed up now; I did discover that we’d forgotten about -Mlfs on the pgcc and pgc++ compilers in a recent Makefile refactoring, but now that that’s fixed, I’ve no idea what the issue may be.

Paul

Apologizes for missing the tile.

I’m not sure what’s wrong, then. Though, try setting the define flag “-D_FILE_OFFSET_BITS=64”. This is the GNU way of setting large file support and I’m wondering now that our C++ compiler GNU compatible, maybe this needs to be added as well.

OK, thanks for the advice, if somewhat confusing. You’re telling me that PGI compilers accept GNU macros to determine whether large files are supported?

Anyway, I tried adding (various combinations of) these macros, and all attempts have failed at the 2**31-1 byte limit.

-D_FILE_OFFSET_BITS=64
-D_LARGEFILE64_SOURCE
-D_LARGEFILE_SOURCE)

This really puts us in a bad situation, since we’re dependent on large file support in order to support our users.

Hi Paul,

I’ve tried a few tests here and they work fine so I’m not reproducing the error correctly.

Can you send PGI Customer Service (trs@pgroup.com) a reproducing example so we can see what’s going on?

Thanks,
Mat

Well, this should show the problem I think. pgfortran gives an error I can’t decipher, and gfortran happily prints a 32 GB file.

% cat big.f90
program big
      integer, parameter :: myunit = 42
      integer(8), parameter :: N = 2_8**32_8
      integer(8) :: i

      open(unit=myunit, file="foo.big", access="stream")
      do i = 1, N
          write(myunit) i
      end do
end program big
% pgfortran -Mlfs -Minform=warn big.f90 && ./a.out
PGFIO/stdio: Value too large for defined data type
PGFIO-F-/OPEN/unit=42/error code returned by host stdio - 75.
 File name = foo.big
 In source file big.f90, at line number 6
% pgfortran -Minform=warn big.f90 && ./a.out
PGFIO/stdio: Value too large for defined data type
PGFIO-F-/OPEN/unit=42/error code returned by host stdio - 75.
 File name = foo.big
 In source file big.f90, at line number 6
% gfortran -m32 big.f90 && ./a.out
% du -hs foo.big
33G     foo.big

a.out is definitely a 32 bit executable; I checked after compiling with both pgi32 and gfortran -m32.

% file a.out    
a.out: ELF 32-bit LSB  executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=21371ad30ca73b0628813f21ee605e4a72015f76, not stripped

I tested the same program with the Intel compiler, and it also writes big files automagically. Ifort claims to be “gfortran compatible” with the usual options like -m32 and so forth. No magic macros necessary.

% ifort -m32 big.f90
% ./a.out

Results in nice big file; and yes, it’s 32 bit

% file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=2cb636801c92f2c5f3293832fdd3637ceb31b2b1, not stripped

Hi Paul,

I see a completely different issue. For me the “i” loop doesn’t get executed unless I make the loop trip count smaller. I can still get generate a 32GB file by putting in an inner loop.

What’s OS are you running? Is it a 32-bit OS? I’ve tried a few different systems but all are 64-bit OS.

% cat big.f90
      program big
      integer, parameter :: myunit = 42
      integer(8), parameter :: N = 2_8**32_8
      integer(8) :: i,j
      print *, "N=",N
      open(unit=myunit, file="foo.big", access="stream")
      do i = 1, (N+999)/1000
        do j = 1, 1000
          write(myunit), (i*1000)+j
        end do
      end do
      close(myunit)
      end program big

% pgfortran -V

pgfortran 15.10-0 32-bit target on x86-64 Linux -tp sandybridge
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved.
% pgfortran -Mlfs big.f90
% file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
% ./a.out
 N=               4294967296
% du -hs foo.big
33G     foo.big

FYI, here’s my test code. Can you give it a try?

        program giga
        implicit double precision (a-h,o-z)
        dimension x(1000)
!       print *,"Assume you linked -Mlfs, and "
        print *,"We assume you have enough disk space"
!       print *,"you have enough disk space"
        print *,' Input number of gb to write'
        read(*,*)ngig
        print *,"will try to create",ngig,"gbyte file bigun"
        loops=ngig*1000000/8
        value=0.0
        open(32,file='bigun',form='unformatted')
        do k=1,loops
          do i=1,1000
             x(i)=value + i
          end do
          value=value+1000
          write(32)(x(j),j=1,1000)
        enddo
        close(32)
        print *,"Looking for file bigun"
        i=system("ls -l bigun")
        print *,"re-opening bigun for reading"
        open(32,file='bigun',form='unformatted')
        value=0.0
        do k=1,loops
          read(32)(x(j),j=1,1000)
          do i=1,1000
            if(x(i) .ne. i+value) print *,"error! i=",i+value
          end do
          value=value+1000
        enddo
        close(32)
!       print *,"removing bigun"
!       i=system("unlink bigun")
        stop
        end

First, I don’t see what’s wrong with my original test code, and neither do the GNU and Intel compilers. Second, your code gives me an error that I don’t understand one bit of.

% cat big.f90
program big
integer, parameter :: myunit = 42
integer(8), parameter :: N = 2_8**32_8
integer(8) :: i, j

print , “N=”,N
open(unit=myunit, file=“foo.big”, access=“stream”)
do i = 1, (N+999)/1000
do j = 1, 1000
write(myunit), (i
1000)+j
end do
end do
close(myunit)
end program big
% pgfortran -Mlfs big.f90 && ./a.out
N= 4294967296
PGFIO-F-215/unformatted write/unit=42/formatted/unformatted file conflict.
File name = foo.big formatted, stream access record = 0(null) In source file big.f90, at line number 10
% pgfortran big.f90 && ./a.out
N= 4294967296
PGFIO-F-215/unformatted write/unit=42/formatted/unformatted file conflict.
File name = foo.big formatted, stream access record = 0(null) In source file big.f90, at line number 10
% pgfortran -V

pgfortran 15.7-0 32-bit target on x86-64 Linux -tp sandybridge
The Portland Group - PGI Compilers and Tools
Copyright © 2015, NVIDIA CORPORATION. All rights reserved.

My OS is Ubuntu 14.04 64 bit as I believe I stated earlier. Gory details:

% uname -a
Linux bender.sintef.no 3.16.0-55-generic #74~14.04.1-Ubuntu SMP Tue Nov 17 10:15:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Finally, ifort and gfortran both compile and run your example code without hiccups under the -m32 flag, i.e. 32 bit executable. Again, I’m suspicious of the PGI compiler, as it’s the “odd man out”.

My OS is Ubuntu 14.04 64 bit as I believe I stated earlier.

You did but I failed to go back and look at the original post. My apologies.

It appears that this isn’t a problem with 32-bit large files but rather a more generic issue with “stream” and why my large file test program works fine. The problem was that when using stream accesses without a “form” specifier, we were defaulting to “formatted”. In your case, this had the side effect of limiting the file size.

Another user reported the problem in August 2015 and it was logged as TPR#21924. We fix the issue in the 15.10 release when we changed the to default form to “unformatted”.

If I manually set your program to use “form=‘unformatted’”, or use PGI 15.10, then the it runs correctly:

[code% cat big.f90
program big
integer, parameter :: myunit = 42
integer(8), parameter :: N = 2_8**32_8
integer(8) :: i, j

print , “N=”,N
open(unit=myunit, file=“foo.big”, access=“stream”, form=“unformatted”)
do i = 1, (N+999)/1000
do j = 1, 1000
write(myunit), (i
1000)+j
end do
end do
close(myunit)
end program big
% pgfortran -Mlfs big.f90 -V15.7 -m32 && ./a.out
N= 4294967296
% du -hs foo.big
33G foo.big
[/code]

  • Mat

Hi Mat!

Finally your example runs on my PGI too. However, I’m still stuck at 2 G file size.

% pgfortran -Mlfs …/big.f90
% ./a.out
N= 4294967296
PGFIO/stdio: File too large
PGFIO-F-/unformatted write/unit=42/error code returned by host stdio - 27.
File name = foo.big unformatted, stream access record = 0(null) In source file …/big.f90, at line number 10
% du -hs foo.big
2.1G foo.big
% file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, not stripped

At this point it feels silly to repeat, but again, gfortran and ifort happily runs your code compiled to 32 bit executable and produces nice, big files.[/quote]

Hi Paul,

This doesn’t make much sense. There’s some disconnect since the only way I can reproduce this “File to large” error is by removing “-Mlfs”. I’ve asked our compiler engineers and there shouldn’t be any other way you’d get this message unless “-Mlfs” wasn’t added.

Maybe the “a.out” your running is from an earlier compile?

Can you please post the output from the following commands.

“pgfortran -v -Mlfs …/big.f90 -o ./big.out; ./big.out”

The verbose, -v, flag will show what internal flags are being passed to the compiler and outputting to a different binary name will ensure that your not picking up an old binary.

  • Mat

OK, one more iteration. Sense or not - it fails as before. Details below. I’ll keep the -v flag in mind. I can’t see anything wrong though, the -Mlfs flag is clearly echoed back.

Paul

~/debug/bigfile2 % module load pgi32         
~/debug/bigfile2 % module list           
Currently Loaded Modulefiles:
  1) pgi32/15.7
~/debug/bigfile2 % ls -l         
total 4,0K
-rw-rw-r-- 1 paul paul 378 feb.  11 20:49 big2.f90
~/debug/bigfile2 % pgfortran -v -Mlfs big2.f90 -o ./big.out; ./big.out      
Export PGI=/opt/pgi/15.7

/opt/pgi/15.7/linux86/15.7/bin/pgf901 big2.f90 -opt 1 -nohpf -nostatic -x 119 0x100000 -x 19 0x400000 -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -tp sandybridge -x 57 0xfb0000 -x 58 0x78031040 -x 47 0x08 -x 48 3328 -stdinc /opt/pgi/15.7/linux86/15.7/include-gcc48:/opt/pgi/15.7/linux86/15.7/include:/usr/lib/gcc/x86_64-linux-gnu/4.8/include:/usr/local/include:/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed:/usr/include -cmdline '+pgfortran big2.f90 -v -Mlfs -o ./big.out' -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ -def i386 -def __i386 -def __i386__ -def __NO_MATH_INLINES -def linux86 -def __THROW= -def __extension__= -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def __STDC_HOSTED__ -freeform -vect 48 -y 54 1 -y 163 0xc0000000 -modexport /tmp/pgfortranxy1dfaMnVDJR.cmod -modindex /tmp/pgfortranpy1dTa2bOFtr.cmdx -output /tmp/pgfortranNy1d1TSEdx34.ilm
  0 inform,   0 warnings,   0 severes, 0 fatal for big
PGF90/x86 Linux 15.7-0: compilation successful

/opt/pgi/15.7/linux86/15.7/bin/pgf902 /tmp/pgfortranNy1d1TSEdx34.ilm -fn big2.f90 -opt 1 -x 51 0x20 -x 119 0xa10000 -x 119 0x100000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 119 0x40000000 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 117 0x1000 -x 119 0x10000000 -tp sandybridge -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -astype 0 -x 124 1 -y 163 0xc0000000 -y 189 0x4000000 -cmdline '+pgfortran big2.f90 -v -Mlfs -o ./big.out' -asm /tmp/pgfortranhy1dvgguqkdG.s
  0 inform,   0 warnings,   0 severes, 0 fatal for big
PGF90/x86 Linux 15.7-0: compilation successful

/usr/bin/as /tmp/pgfortranhy1dvgguqkdG.s --32 -o /tmp/pgfortran3y1dLaUsDg7a.o

/opt/pgi/15.7/linux86/15.7/bin/pgappend -noerror /tmp/pgfortran3y1dLaUsDg7a.o -name .IPDINFO /tmp/pgfortranxy1dfaMnVDJR.cmod -name .IPEINFO /tmp/pgfortranpy1dTa2bOFtr.cmdx

/usr/bin/ld /usr/lib32/crt1.o /usr/lib32/crti.o /opt/pgi/15.7/linux86/15.7/lib/trace_init.o /usr/lib/gcc/x86_64-linux-gnu/4.8/32/crtbegin.o /opt/pgi/15.7/linux86/15.7/lib/initmp.o /opt/pgi/15.7/linux86/15.7/lib/f90main.o -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /opt/pgi/15.7/linux86/15.7/lib/pgi.ld -L/opt/pgi/15.7/linux86/15.7/liblf -L/opt/pgi/15.7/linux86/15.7/lib -L/usr/lib32 -L/usr/lib/gcc/x86_64-linux-gnu/4.8/32 /tmp/pgfortran3y1dLaUsDg7a.o -rpath /opt/pgi/15.7/linux86/15.7/liblf -rpath /opt/pgi/15.7/linux86/15.7/lib -o ./big.out -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc /usr/lib/gcc/x86_64-linux-gnu/4.8/32/crtend.o /usr/lib32/crtn.o
Unlinking /tmp/pgfortranNy1d1TSEdx34.ilm
Unlinking /tmp/pgfortranFy1dDCYdklgt.stb
Unlinking /tmp/pgfortranxy1dfaMnVDJR.cmod
Unlinking /tmp/pgfortranpy1dTa2bOFtr.cmdx
Unlinking /tmp/pgfortranhy1dvgguqkdG.s
Unlinking /tmp/pgfortran-y1d94wuFY1n.ll
Unlinking /tmp/pgfortran3y1dLaUsDg7a.o
 N=               4294967296
PGFIO/stdio: File too large
PGFIO-F-/unformatted write/unit=42/error code returned by host stdio - 27.
 File name = foo.big    unformatted, stream access   record = 0(null) In source file big2.f90, at line number 11

The main difference with or without -Mlfs is which set of runtime libraries you link with. With -Mlfs, the “liblf” libraries are used, without, those found in “lib” are used. Since there are a few libraries only included in “lib”, the link command includes both paths “-L/opt/pgi/15.7/linux86/15.7/liblf -L/opt/pgi/15.7/linux86/15.7/lib”.

If the “libfl” directory was removed from your installation or if you didn’t have read permissions on the directory, your link would still succeed, but resolve to the “lib” directory. It’s a long shot and would mean that your installation is broken, but can you check if “/opt/pgi/15.7/linux86/15.7/liblf” exists and that you have read permissions?

Also, we can check which library it’s picking up by running “ldd” on the binary:

% ldd big.out
        linux-gate.so.1 =>  (0x55576000)
        libnuma.so => /proj/pgi/linux86/15.7/lib/libnuma.so (0x55578000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x55588000)
        libpgc.so => /proj/pgi/linux86/15.7/liblf/libpgc.so (0x555a2000)
        librt.so.1 => /lib/librt.so.1 (0x555de000)
        libm.so.6 => /lib/libm.so.6 (0x555e7000)
        libc.so.6 => /lib/libc.so.6 (0x5562a000)
        /lib/ld-linux.so.2 (0x55555000)

Other than that, I’m really at a loss here. Can you send your “big.out” to PGI Customer Service (trs@pgroup.com)? I can then send you mine as well.

  • Mat

OK, I’ve got some data points here. I’ve checked the installations on 3 different machines from different eras running different linuxes with various PGI installations. I’ve done approximately

cd /opt/pgi/15.7
find . -name liblf

on each machine, replacing 15.7 for various version numbers. On some installations I do find the runtime lib you’re talking about; on others, I’m stumped.

- My desktop - ubuntu 14.04
  - PGI 15.7, both 32 and 64 bit installed: no liblf directory
  - PGI 15.10, only 32 bit installed: no liblf directory
- Our new build server - CentOS 7
  - PGI 14.7, both 32 and 64 bit installed: no liblf directory
  - PGI 15.7, both 32 and 64 bit installed: no liblf directory
- Our old build server - SUSE Enterprise Linux 9, uname -a says it was built in 2005
  - PGI 7.0, both 32 and 64 bit installed: liblf directory is present in 32 bit installation tree
  - PGI 9.0, both 32 and 64 bit installed: liblf directory is present in 32 bit installation tree
  - PGI 14.7, only 32 bit installed: no liblf directory

I cd’d into every single “bin” directory and ran

pgf95 -V

and verified compiler versions, too, to ensure that the directory names weren’t screwed up (never trust your sysadmin, right??).

What do we do now? We’re not going to get our new build system up on the ancient SUSE machine, and our code uses features too new for 7.0 and 9.0, so that option is off the table.

One more data point - I installed 16.1 on my desktop and no liblf directory appears. I guess that you’ve dropped it from your installation package somewhere around 2010 and noone noticed because most people moved to 64 bit. If we only knew why our code doesn’t compile with pgi64…

Hi Paul,

This is embarrassing but at least I know I’m not going crazy. My suspicion was correct in that it’s the missing liblf libraries that’s causing the error. The culprit however appears to be our installation script. While the libraries are part of the installation package, the installer isn’t copying them over to the install tree. They were getting installed here at PGI and why everything worked fine for me. I’ve added TPR#22285 and sent it to our manufacturing folks to get it corrected.

As a work-around, you can manually copy the liblf directory from the install package over to the installation tree.

One more data point - I installed 16.1 on my desktop and no liblf directory appears. I guess that you’ve dropped it from your installation package somewhere around 2010 and noone noticed because most people moved to 64 bit. If we only knew why our code doesn’t compile with pgi64…

The packages just got split so the download size is smaller. One for 64-bit and one for 32-bit. From the download link above, select “32-bit only” from the “target” drop down box to get the 32-bit package.

But you’re correct. 32-bit is rarely used any longer and -Mlfs even less. I do thank you for your persistence in helping track down this problem.

As for the 64-bit problem, are you referring to the getfile size bug that was fixed in 16.1, the missing F2008 features, or is there another issue?

Thanks again,
Mat

So, the story is as follows. Our code won’t compile with 64 bit. It’s 99% Fortran, but a few system calls are done from C (get environment variable, get file size, etc). This part of the code was written before 1990. There are some hard to track bugs that appear if you just flip the 64 bit switch with the PGI compiler (not with others) but that’s surely in our interface code, which is a mess. Since F2008 has most of these things, in fact all of them, I figured I’d just chuck the C code out the window, and use modern Fortran features. This works splendidly on gcc and ifort, but on PGI compilers I’ve found nothing but bugs and troubles: file size, environment variables, and so forth.

Since I didn’t manage to remove the C code for PGI we’re still shipping 32 bit PGI builds. However, we’re now unable to build with large files, so our 32 bit builds are even less capable than they used to be, and our users do need to write large results files sometimes. We’ve moved too much forwards with Fortran features before discovering the -Mlfs bug that we can’t revert to our old PGI version. With our other development compilers we can use more RAM, write big files, and the 64 bit performance advantages are there, although I haven’t measured them.

All these bugs block any progress with the PGI compiler. I’d like to stop using PGI but the boss says no. Eventually this will change, as we can’t ship a new 64 bit PGI based release as it stands, and the budget is finite. The alternative is to waste a lot of money with C based workarounds for the features missing from PGI, which will get us nowhere in terms of bug fixes or features for the customer to enjoy.

We corrected the issue where the large file support libraries for the
32-bit Linux compiler products were not copied into the install directories.
They were in the download, but did not make it into the installation.
This has been fixed in the current 16.3 release.


dave