Hello,
I am using PGI Community Edition Version 19.10(mpif90) to run a sample MPI program on Ubuntu 16.04 x86_64 Nvidia V100 machine.
https://people.sc.fsu.edu/~jburkardt/f_src/prime_mpi/prime_mpi.f90
I am able to compile successfully using below command
/opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpif90 -o prime_pgi prime.f90
When I run ,
/opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpirun --allow-run-as-root -np 1 ./prime_pgi
Output Log:
libibverbs: Warning: couldn’t open config directory ‘/usr/etc/libibverbs.d’.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: couldn’t open config directory ‘/usr/etc/libibverbs.d’.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
--------------------------------------------------------------------------
[[65328,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 60a69b732c06
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
17 July 2020 12:46:55.898 PM
PRIME_MPI
FORTRAN90/MPI version
An MPI example program to count the number of primes.
The number of processes is 1
N Pi Time
1 0 0.645239E-04
2 1 0.101514E-05
4 2 0.701286E-06
8 4 0.739004E-06
16 6 0.964850E-06
32 11 0.188220E-05
64 18 0.409596E-05
128 31 0.110879E-04
256 54 0.340450E-04
512 97 0.117180E-03
1024 172 0.402481E-03
2048 309 0.144204E-02
4096 564 0.518912E-02
8192 1028 0.178888E-01
16384 1900 0.632647E-01
32768 3512 0.233272
65536 6542 0.622287
131072 12251 2.39661
PRIME_MPI:
Normal end of execution.
17 July 2020 12:46:59.289 PM
Warning: ieee_inexact is signaling
FORTRAN STOP
I found to check this warning ieee_inexact, we can use -Ktrap=inexact flag during compilation.
/opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpif90 -Ktrap=inexact -o prime_pgi_trap prime.f90
Run: /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/bin/mpirun --allow-run-as-root -np 1 ./prime_pgi_trap
Log:
libibverbs: Warning: couldn’t open config directory ‘/usr/etc/libibverbs.d’.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
[60a69b732c06:09253] *** Process received signal ***
[60a69b732c06:09253] Signal: Floating point exception (8)
[60a69b732c06:09253] Signal code: Floating point inexact result (6)
[60a69b732c06:09253] Failing at address: 0x7f217d342ad8
[60a69b732c06:09253] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f217f57a890]
[60a69b732c06:09253] [ 1] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(+0x1ecad8)[0x7f217d342ad8]
[60a69b732c06:09253] [ 2] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(+0x1ec769)[0x7f217d342769]
[60a69b732c06:09253] [ 3] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(+0x7e266)[0x7f217d1d4266]
[60a69b732c06:09253] [ 4] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(mca_base_framework_components_open+0x4d)[0x7f217d1d414d]
[60a69b732c06:09253] [ 5] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(mca_base_framework_open+0xb1)[0x7f217d1de981]
[60a69b732c06:09253] [ 6] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-pal.so.40(opal_init+0x122)[0x7f217d1b2852]
[60a69b732c06:09253] [ 7] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/…/lib/libopen-rte.so.40(orte_init+0xcb)[0x7f217db48d3b]
[60a69b732c06:09253] [ 8] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/libmpi.so.40(ompi_mpi_init+0x304)[0x7f2180cdc164]
[60a69b732c06:09253] [ 9] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/libmpi.so.40(PMPI_Init+0x8c)[0x7f2180d188ec]
[60a69b732c06:09253] [10] /opt/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/lib/libmpi_mpifh.so.40(mpi_init__+0x29)[0x7f2182524739]
[60a69b732c06:09253] [11] ./prime_pgi_trap[0x40154f]
[60a69b732c06:09253] [12] ./prime_pgi_trap[0x4014f3]
[60a69b732c06:09253] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f217e484b97]
[60a69b732c06:09253] [14] ./prime_pgi_trap[0x4013fa]
[60a69b732c06:09253] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node 60a69b732c06 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
But, when I run the code without PGI using normal OpenMPI, I am not getting any warning/error.
/opt/ompi/bin/mpif90 -o prime_ompi4 prime.f90
Run:
/opt/ompi/bin/mpirun --allow-run-as-root -np 1 ./prime_ompi4
Log:
17 July 2020 12:47:46.578 PM
PRIME_MPI
FORTRAN90/MPI version
An MPI example program to count the number of primes.
The number of processes is 1
N Pi Time
1 0 0.336960E-04
2 1 0.842000E-06
4 2 0.857000E-06
8 4 0.111200E-05
16 6 0.148800E-05
32 11 0.270900E-05
64 18 0.618200E-05
128 31 0.136750E-04
256 54 0.392750E-04
512 97 0.144316E-03
1024 172 0.443880E-03
2048 309 0.158926E-02
4096 564 0.574606E-02
8192 1028 0.213788E-01
16384 1900 0.770046E-01
32768 3512 0.248138
65536 6542 0.753737
131072 12251 2.99614
PRIME_MPI:
Normal end of execution.
17 July 2020 12:47:50.685 PM
We are not getting any Warning/Error in normal OpenMPI.
Could you suggest what I might be missing or using wrongly in PGI compilers.
Thanks,
Pavan