MPICH compiler error

Folks
We are using MPICH compiler under PGI for linux86 6.1
We have a model and it runs good for 2 nodes

But as the nodes are increased to 4 or 8 , then we have the following error

p5_6255: p4_error: interrupt SIGFPE: 8
p4_10946: p4_error: interrupt SIGFPE: 8
p6_5792: p4_error: interrupt SIGFPE: 8
p3_6883: p4_error: interrupt SIGFPE: 8
p2_7224: p4_error: interrupt SIGFPE: 8
p7_4946: p4_error: interrupt SIGFPE: 8
p1_30552: p4_error: interrupt SIGFPE: 8

p0_31756: p4_error: interrupt SIGFPE: 8


—samky.e2235------
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Warning: No xauth data; using fake authentication data for X11 forwarding.^M
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT. Error message is MPI_ALLREDUCE and code is 197

Any solutions

Hi Amjad Majid Ali,


Your getting a floating point exceptions error, such as a divide by zero execption, somewhere in your code. Assuming you have the PGI CDK prodict, the first thing to do is run your program using the PGDBG debugger. To launch the debugger using MPI, add the flag “-dbg=pgdbg” to your mpirun command. PGDBG will be able to trap this error and give you a better idea as to the problem. You should also compile your code with “-g” if you wish to view the source code instead of just the disassembly.

You can also compile your code with “-Ktrap=fp” to trap the FPE. The application will abort when a FPE is encountered and print a message indicating the exact exception raised as well as the line number of the program. It won’t tell your the exacpt problem, but should give your and idea.

Hope this helps,
Mat