Getting Warnings related to IEEE inexact with PGIs MPI Compiler

Hello,
I am using PGI Community Edition Version 19.10(mpif90) to run a sample MPI program on Ubuntu 16.04 x86_64 Nvidia V100 machine.
https://people.sc.fsu.edu/~jburkardt/f_src/prime_mpi/prime_mpi.f90

I am able to compile successfully using below command
/opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpif90 -o prime_pgi prime.f90
When I run ,
/opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpirun --allow-run-as-root -np 1 ./prime_pgi
Output Log:
libibverbs: Warning: couldn’t open config directory ‘/usr/etc/libibverbs.d’.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: couldn’t open config directory ‘/usr/etc/libibverbs.d’.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
--------------------------------------------------------------------------
[[65328,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 60a69b732c06

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
17 July 2020  12:46:55.898 PM

PRIME_MPI
  FORTRAN90/MPI version

  An MPI example program to count the number of primes.
  The number of processes is        1

         N        Pi          Time

         1         0  0.645239E-04
         2         1  0.101514E-05
         4         2  0.701286E-06
         8         4  0.739004E-06
        16         6  0.964850E-06
        32        11  0.188220E-05
        64        18  0.409596E-05
       128        31  0.110879E-04
       256        54  0.340450E-04
       512        97  0.117180E-03
      1024       172  0.402481E-03
      2048       309  0.144204E-02
      4096       564  0.518912E-02
      8192      1028  0.178888E-01
     16384      1900  0.632647E-01
     32768      3512  0.233272
     65536      6542  0.622287
    131072     12251   2.39661

PRIME_MPI:
  Normal end of execution.

17 July 2020  12:46:59.289 PM
Warning: ieee_inexact is signaling
FORTRAN STOP

I found to check this warning ieee_inexact, we can use -Ktrap=inexact flag during compilation.
/opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpif90 -Ktrap=inexact -o prime_pgi_trap prime.f90
Run: /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpirun --allow-run-as-root -np 1 ./prime_pgi_trap
Log:
libibverbs: Warning: couldn’t open config directory ‘/usr/etc/libibverbs.d’.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
[60a69b732c06:09253] *** Process received signal ***
[60a69b732c06:09253] Signal: Floating point exception (8)
[60a69b732c06:09253] Signal code: Floating point inexact result (6)
[60a69b732c06:09253] Failing at address: 0x7f217d342ad8
[60a69b732c06:09253] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f217f57a890]
[60a69b732c06:09253] [ 1] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(+0x1ecad8)[0x7f217d342ad8]
[60a69b732c06:09253] [ 2] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(+0x1ec769)[0x7f217d342769]
[60a69b732c06:09253] [ 3] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(+0x7e266)[0x7f217d1d4266]
[60a69b732c06:09253] [ 4] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(mca_base_framework_components_open+0x4d)[0x7f217d1d414d]
[60a69b732c06:09253] [ 5] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(mca_base_framework_open+0xb1)[0x7f217d1de981]
[60a69b732c06:09253] [ 6] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(opal_init+0x122)[0x7f217d1b2852]
[60a69b732c06:09253] [ 7] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-rte.so.40(orte_init+0xcb)[0x7f217db48d3b]
[60a69b732c06:09253] [ 8] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/libmpi.so.40(ompi_mpi_init+0x304)[0x7f2180cdc164]
[60a69b732c06:09253] [ 9] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/libmpi.so.40(PMPI_Init+0x8c)[0x7f2180d188ec]
[60a69b732c06:09253] [10] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/libmpi_mpifh.so.40(mpi_init__+0x29)[0x7f2182524739]
[60a69b732c06:09253] [11] ./prime_pgi_trap[0x40154f]
[60a69b732c06:09253] [12] ./prime_pgi_trap[0x4014f3]
[60a69b732c06:09253] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f217e484b97]
[60a69b732c06:09253] [14] ./prime_pgi_trap[0x4013fa]
[60a69b732c06:09253] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node 60a69b732c06 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------

But, when I run the code without PGI using normal OpenMPI, I am not getting any warning/error.
/opt/ompi/bin/mpif90 -o prime_ompi4 prime.f90
Run:
/opt/ompi/bin/mpirun --allow-run-as-root -np 1 ./prime_ompi4
Log:
17 July 2020 12:47:46.578 PM

PRIME_MPI
  FORTRAN90/MPI version

  An MPI example program to count the number of primes.
  The number of processes is        1

         N        Pi          Time

         1         0  0.336960E-04
         2         1  0.842000E-06
         4         2  0.857000E-06
         8         4  0.111200E-05
        16         6  0.148800E-05
        32        11  0.270900E-05
        64        18  0.618200E-05
       128        31  0.136750E-04
       256        54  0.392750E-04
       512        97  0.144316E-03
      1024       172  0.443880E-03
      2048       309  0.158926E-02
      4096       564  0.574606E-02
      8192      1028  0.213788E-01
     16384      1900  0.770046E-01
     32768      3512  0.248138
     65536      6542  0.753737
    131072     12251   2.99614

PRIME_MPI:
  Normal end of execution.

17 July 2020  12:47:50.685 PM

We are not getting any Warning/Error in normal OpenMPI.
Could you suggest what I might be missing or using wrongly in PGI compilers.

Thanks,
Pavan

The Fortran standard says that when you hit a “STOP” statement, to print out the results of the exception/status bits. So, that is why you see “Warning: ieee_inexact is signaling”. You shouldn’t really worry about inexact, almost every program creates inexact results, assuming there is floating point rounding going on. So, when you enable exceptions on inexact operations, you trap almost immediately. There is a way to turn off the extra printing of exception status at the STOP statement, the environment variable NO_STOP_MESSAGE.

Thanks for the clarification, I am also getting warning as “Warning: ieee_invalid is signaling” and getting output as Nan for different example. Is it problem with my code?

Most likely it is a problem in your code. You can compile and link with -Ktrap=inv and find out where that occurs.

1 Like