buffer issue with Fortran compiler??

abhivg · July 11, 2007, 9:05am

I am trying to run a 64 bit application built using PGI fortran compiler.
Initially the application was not displaying the expected output on screen. So I added some debug ‘write’ statements in the code and tried after building again. Now the expected output got displayed along with my debug write statements.

The same application when built with Intel Fortran compiler runs fine.
Is this some output buffer issue? Or is there some other problem. It would be really helpful if someone could point me in the right direction.

BTW this application is a 64 bit MPI application. However I am testing only on a single machine. Also the application contains both Fortran and C code. I have used Microsoft Visual C++ Compiler to build the C code.
I am using pgf90 on windows with MSMPI.

thanks in advance.
Abhishek

MatColgrove · July 11, 2007, 4:21pm

Hi Abhishek,

I am trying to run a 64 bit application built using PGI fortran compiler.
Initially the application was not displaying the expected output on screen. So I added some debug ‘write’ statements in the code and tried after building again. Now the expected output got displayed along with my debug write statements.

Ah, the Heisenberg debugging problem. The most likely reasons for this is that the compiler is performing an optimization which gets inhibited when you insert in the WRITE statements. To test this theory, compile your original program (no WRITEs) without optimization, “-O0”. Do you now get the expected answer? Is so, then the next step to narrow down which optimization is causing the problem and exactly where in the code the problem occurs.

For problem 1, the optimization, start by compiling at “-O1” then “-O2”, “-fast”, etc. until you’ve found the optimization that’s causing the problem. Once there, reinsert the WRITE statements, either in a binary search or serially, until the problem disappears again. While doing this keep a log of the “-Minfo” messages (add this flag to the compilation to have the compiler display which optimizations it’s performing). Compare the passing and failing Minfo logs to determine what optimizations were inhibited and where in the source they occur. The final step is to compile again with the problem optimization and"-gopt". Then use the PGI debugger, PGDBG, to determine the exact cause.

If you’re still failing at “-O0”, then look for UMRs (uninitialized memory reads) or porting problems. The same method I described above, except for the optimization hunting, will apply to these types of errors as well.

While this does seem like a lot of work and you may be tempted to simply use Intel or the lowest passing optimization, if this is a bug in your program then you may have problems in the future, or if it’s a problem with the PGI compilers, we would very much appreciate a report sent to trs@pgroup.com so we can fix the issue.

Best Regards,
Mat

abhivg · July 13, 2007, 9:02am

Hi Mat,

I tried without optimizations (-O0) but it didnt help. I am not really sure if its an issue with the compiler (the thread title may seem a bit misleading) or something else. Will get back in case of any updates.

thanks,
Abhishek

MatColgrove · July 13, 2007, 7:32pm

Another thought is that you have an array out-of-bounds error. Try compiling with “-Mbounds” to see if anything shows up.

Mat

abhivg · July 18, 2007, 9:46am

I tried with -Mbounds, but no change.

I am testing the application by using:

mpiexec -n 4 appname.exe

The application has a call to MPI_Abort which should kill all 4 processes. The rank 0 process should be printing the output, and then the abort should happen.

I read in some forum that the problem might be with the output buffers not getting flushed before the call to abort. However I confirmed that both stdout and stderr are being flushed explicitly before calling abort. I am trying to set the bufferring on ‘stdout’ and ‘stderr’ to unbuffered by using setvbuf3f and see if this works.

Abhishek

PeteBradley · July 23, 2007, 2:57pm

I think what you described in your original note is an issue getting output of a write statement. Most fortran implementations use delayed write to improve performance. This happens to all units, but MPI can compund this with unit 6 because that output comes back through the MPI implementation.

Are you doing a flush after the write, as in:

write (6, *) ‘Hello world!’
call flush(6)

If not, I’d start there. flush() is nonstandard but available in one way or another on most common platforms (except AIX, last I checked).

Next, if your program writes something to unit 6 and then immediately calls MPI_abort, it is quite common to have the last output swallowed. Solutions are kludgy. The ugliest but most effective is to use the flush as above and sleep 10 seconds before bailing out of MPI.

Pete