MPI only using 1 port on dual port IB NIC

Bryon · September 22, 2022, 8:49am

I have a test setup with 2 nodes HGX A100.
Both nodes contain 4 cards; MCX653106A-ECAT-SP, they use splitter cables, 8 links to 4 ports on an MQM8700. All 8 ports on both nodes are active @ 100gbps and have been measured at ~linerate.

When launching MPI like this:
./mpirun -np 2 --host 10.0.99.245,10.0.99.246 -x NCCL_P2P_LEVEL=PXB singularity exec /env/nvidia_pytorch_22.08.sif ../nccl-tests2/build/all_reduce_perf -g 8 -b 32M -e 2048M -t 1 -n 200 -w 10 -f 2

It shows on the switch that only 1 port on the cards get used. I then tried
./mpirun -np 2 --host 10.0.99.245,10.0.99.246 -x NCCL_IB_HCA=mlx5_0,mlx5_1,mlx5_2,mlx5_3,mlx5_6,mlx5_7,mlx5_8,mlx5_9 singularity exec /env/nvidia_pytorch_22.08.sif ../nccl-tests2/build/all_reduce_perf -g 8 -b 32M -e 2048M -t 1 -n 200 -w 10 -f 2

Which still results in only mlx5_0, mlx5_2, mlx5_6 and mlx5_8 getting used.
This command:
./mpirun -np 2 --host 10.0.99.245,10.0.99.246 -x NCCL_IB_HCA=mlx5_1,mlx5_3,mlx5_7,mlx5_9 singularity exec /env/nvidia_pytorch_22.08.sif ../all_reduce_perf -g 8 -b 32M -e 2048M -t 1 -n 200 -w 10 -f 2
Shows the same bandwidth and routes traffic over the other ports, indicating all ports work correctly.

When running verbose, it shows how GPU0 and GPU1 both find mlx5_0 the best option, is there a way we can set some affinity to let GPU0 run over mlx5_0 and GPU1 over mlx5_1 etc?

ssimcoejr · October 25, 2022, 10:24pm

Hello Bryon,

Have you attempted using a preceding ‘=’, as noted in the NCCL_IB_HCA section of the Environment Variables — NCCL 2.15.5 documentation ? It may be that the match is being performed incorrectly.

Also, would recommend experimenting with the port specifier argument as well.

If this does not succeed, it may be best to open an issue on the NCCL Github, or engage our support team by creating a ticket via our portal at ESPCommunity for further assistance.

Best,
NVIDIA Technical Support

system · November 8, 2022, 10:24pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to run HPL script over Ethernet nvc, nvc++ and nvfortran hpc	5	592	June 25, 2024
Using multiple GPUs Legacy PGI Compilers	7	22081	August 11, 2009
Nccl-test poor performance GPU-Accelerated Libraries	3	320	October 29, 2024
MPS Server is working with a single node multi-GPU but not working with two nodes multi-GPU CUDA Programming and Performance	0	625	March 28, 2024
IB 2x100GB splitter cable only cable "1" works MCP7H50 InfiniBand/VPI Adapter Cards	3	753	October 6, 2023
Issue of Running OpenMPI on Multiple GPU Nodes with InfiniBand nvc, nvc++ and nvfortran openmpi	12	2233	March 11, 2024
How can I tell whether NCCL is using PCIe or IB network interface while doing AllReduce? Deep Learning (Training & Inference)	0	727	March 6, 2020
problem with multi gpu using mpi Legacy PGI Compilers	2	2178	December 2, 2015
Installation HPC-X question - hello_c is getting an error Software And Drivers	6	341	August 14, 2019
MPI mixing host and gpu devices with PGI accelerator Legacy PGI Compilers	5	3935	December 7, 2011

MPI only using 1 port on dual port IB NIC

Related topics