Hi,there.I installed nvhpc_sdk_20.7 on a centos7.9 with CUDA driver 11.0.After Installation,I run nvcc --version
mpirun --version
,they run correctly.But when I want to compile an example with offical example(…/examples/MPI/samples/mpihello),I counted following error:
make
mpif90 -fast -o mpihello.out mpihello.f
mpif90 -fast -o mpihello_f90.out mpihello.f90
mpicc -fast -Bdynamic -o myname.out myname.c
"myname.c", line 16: warning: function "printf" declared implicitly
printf("My name is %s\n",hname);
^
"myname.c", line 11: warning: variable "ierr" was declared but never referenced
int len,ierr;
^
--------------- Executing mpihello.out ----------------------
mpirun -np 2 ./mpihello.out
[localhost:120913] *** Process received signal ***
[localhost:120913] Signal: Floating point exception (8)
[localhost:120913] Signal code: Integer divide-by-zero (1)
[localhost:120913] Failing at address: 0x7f1ba84b41a9
[localhost:120913] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7f1ba7257630]
[localhost:120913] [ 1] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(+0x1151a9)[0x7f1ba84b41a9]
[localhost:120913] [ 2] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(+0x116522)[0x7f1ba84b5522]
[localhost:120913] [ 3] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(+0x1142a8)[0x7f1ba84b32a8]
[localhost:120913] [ 4] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(+0x113d78)[0x7f1ba84b2d78]
[localhost:120913] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(+0x120abe)[0x7f1ba84bfabe]
[localhost:120913] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(opal_hwloc1117_hwloc_topology_load+0x19b)[0x7f1ba84be01b]
[localhost:120913] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-pal.so.40(opal_hwloc_base_get_topology+0x2fe)[0x7f1ba8494c6e]
[localhost:120913] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-rte.so.40(+0x75ae9)[0x7f1ba88caae9]
[localhost:120913] [ 9] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-rte.so.40(orte_init+0x296)[0x7f1ba8940ef6]
[localhost:120913] [10] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/../../lib/libopen-rte.so.40(orte_submit_init+0xb50)[0x7f1ba8941be0]
[localhost:120913] [11] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpirun[0x4013f7]
[localhost:120913] [12] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpirun[0x401302]
[localhost:120913] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f1ba681e555]
[localhost:120913] [14] /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpirun[0x401219]
[localhost:120913] *** End of error message ***
/opt/nvidia/hpc_sdk/Linux_x86_64/20.7/comm_libs/mpi/bin/mpirun: line 15: 120913 Floating point exception(core dumped) $MY_DIR/.bin/$EXE "$@"
make: *** [run] Error 136
I noticed that it said “Signal code: Integer divide-by-zero (1)”,
but the code in mpihello.c
is simply below
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include<stdio.h>
#include<Winsock2.h>
#pragma comment(lib, "Ws2_32.lib")
#else
#include <unistd.h>
#endif
#include "mpi.h"
main(int argc, char **argv){
int len,ierr;
char hname[32];
len = 32;
MPI_Init( &argc, &argv );
gethostname(hname,len);
printf("My name is %s\n",hname);
MPI_Finalize( );
}
There is not anything about "Integer divide-by-zero ".
I have no idea what these errors about,and I search in forums with no similar topic.