Failure of Fortran unformatted reads of files over 2 GB

kb64 · October 25, 2011, 9:15am

We have been writing large binary files of less than 2GB using C++ and reading them successfully with Fortran programs compiled with pgf95 using UNFORMATTED reads on the same Linux computer.

We have now exceeded the 2GB limit (more correctly, 2 to the power 9 bytes) and find the read fails in the Fortran programs with the message “attempt to read past end of file”.

The C++ program writes the data as described in the following pseudo code:

int nloci = 45000;
int nanis = 3000;
int bytesInArray = nloci * nanis * 8;
double cz (nloci,nanis);

write (bostream, bytesInArray);
write (bostream, cz);
write (bostream, bytesInArray);

The Fortran programs read the file in the following way:

double precision, dimension(:,:), allocatable :: z
integer nloci = 45000
integer nanis = 3000
allocate( z(nloci,nanis))
open(32,file=infile,status=‘old’,form=‘unformatted’)
read(32) z
close(32)

The Fortran code is complied with the -Mlarge_arrays option.

The focus of our search for a solution has been the Type and Value of the first record in the file (ie, bytesInArray).

If bytesInArray is less than 2 to the power 9, then it can be stored in a variable of Type integer and correctly specifies the number of bytes to follow in the binary array.

If bytesInArray is more than 2 to the power 9, then it will overflow an integer and no longer specify the number of bytes in the array.

What does a Fortran UNFORMATTED read expect to find at the beginning of a file if the file is more than 2 GB?

Any suggestions that will help us solve our problem will be greatly appreciated.

MatColgrove · October 26, 2011, 11:26pm

Hi kb64,

The problem is that a FORTRAN variable length unformatted file is presented as a sequence of records, where each record has the layout

<NB>  <DATA>  <NB>

where,
is an int indicating the number of bytes in
a sequence of bytes of data
So, if the record is ‘large’ (>2GB), the size of the record overflows.

To accommodate this situation, we actually split up the record into multiple unformatted records and use to indicate split records. So your C++ program need to accommodate this.

Since the maximum value of an int is 2147483647 (0x7fffffff), when a record is larger than this value, it must be written in chunks. Setting amount of data in a continued record as 2147483639 (0x7fffffff-8) bytes indicates that a chunk is continued. The record lenght before and after the 2147483639 bytes of data will have the value (2147483639|0x80000000), i.e., an int with the sign bit on.

To write a large unformatted record, your C++ file output code show look something like:

int64 recsize // number of bytes in the record
char *p       // pointer to data to be written
int csz        //  2147483639 | 0x80000
while (recsize > 2147483639) {
   write(bostream, &csz, 4)       -- write 4 bytes
   write(bostream, p, 2147483639) -- write 2147483639 btytes of p
   write(bostream, &csz, 4)
   p += 2147483639
   recsize -= 2147483639
}

write(bostream, &recsize, 4)
write(bostream, p, recsize)
write(bostream, &recsize, 4)

Hope this helps,
Mat

Topic		Replies	Views
Compile failes at TRANSFER intrinsic transfering over 2GiB data nvc, nvc++ and nvfortran hpc-compilers-nvfortran	5	35	June 27, 2025
Two issues: (1) unformatted file data layout of a derived type of array of derived type of ..., (2) 64bit support of STORAGE_SIZE intrinsic nvc, nvc++ and nvfortran hpc-compilers-nvfortran	4	28	June 18, 2025
Can nvfortran handle large direct access I/O? nvc, nvc++ and nvfortran	6	693	July 5, 2023
REC length problem when writing unformatted, direct access Legacy PGI Compilers	3	5494	November 2, 2006
Fortran read question (raw binary in input file) Legacy PGI Compilers	2	3934	May 24, 2013
over 20GB arraysize Legacy PGI Compilers	2	2108	April 13, 2018
Error creating text files on Fortran and CUDA Fortran Legacy PGI Compilers	7	6411	April 11, 2011
fortran run time error Legacy PGI Compilers	2	14976	April 4, 2012
problem with internal file I/O Legacy PGI Compilers	1	3199	June 8, 2007
Apparent bug in Fortran device-to-host copies above 2GB Legacy PGI Compilers	2	2658	May 15, 2013

Failure of Fortran unformatted reads of files over 2 GB

Related topics