Hello!
When in GPU node I type: mpirun true
"
Failed to create a queue pair (QP):
Hostname: nodec14
Requested max number of outstanding WRs in the SQ: 1
Requested max number of outstanding WRs in the RQ: 2
Requested max number of SGEs in a WR in the SQ: 2048
Requested max number of SGEs in a WR in the RQ: 1024
Requested max number of data that can be posted inline to the SQ: 0
Error: Operation not supported
Check requested attributes.
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the “ud” oob component
in this run.
Hostname: nodec14
"
File: /etc/security/limits.conf
- soft memlock unlimited
- hard memlock unlimited
- soft stack 300000
- hard stack unlimited
Thanks in advance
Hi marc2098,
I’ll ask our MPI experts to review your post, but see that you also posted the question over on OpenMPI’s Github: Failed to create a queue pair (QP) · Issue #11236 · open-mpi/ompi · GitHub where Jeff suggested using a later version of OpenMPI and use UCX PML. You mentioned that you see the same issue with OpenMP 4.1.5 and 4.0.5, though have you tried the HPCX version of OpenMPI that we aslo ship? (located in “<install_dir>/Linux_x86_64/dev/comm_libs/hpcx/hpcx-2.13/ompi/bin”. This version uses UCX.
-Mat
I talked with Chris and he asks which version of the InfiniBand stack (OpenIB) you have installed on your system?
The thought being that it may not be compatible with the one we build OpenMPI against here. He recommends checking which which OpenIB version you have installed and then download the matching HPC-X version from https://developer.nvidia.com/networking/hpc-x
Otherwise it may be something about your InfiniBand setup is not working properly - especially if none of the Open MPI builds are working for you.
If this does not solve your problem, we’ll need to contact the Mellanox team to see if they have any suggestions.
-Mat
Hi Mat,
Thank you for your answers. Thanks also to Chris.
I have to investigate. I don’t know how to check the OpenIB version.
I don’t have an Infiniband connection. Only two Ethernet Connection X722 for 10GBASE-T devices.
60:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)
60:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)
Marc