Two network cards that share a pcie bridge (or a common numa) will cause a slowdown

image




The above is the configuration of my server, each server is connected with 8 400G network cards, even two 800G switches. When I was testing ib_write_bw, I found that when two network cards share a numa, the speed will be slowed down. But when the two Nics work together, the speed will drop to 187Gb/s. I suspect that it is caused by sharing a numa. What can I do to achieve 394Gb/s for each Nics? Here are my commands to run ib_write_bw and nccl-test
server ib_write_bw -d mlx5_0 -q 5 --port 10002 --report_gbits --run_infinitely -F -n 2000
clinet ib_write_bw -d mlx5_0 -q 5 --port 10002 --report_gbits --run_infinitely -F -n 2000 10.102.20.5

mpirun -np 16 -H 10.102.20.5:8,10.102.20.6:8 --allow-run-as-root -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x NCCL_IB_GID_INDEX=3 -x NCCL_IB_DISABLE=0 -x NCCL_SOCKET_IFNAME=eth0 -x NCCL_IB_HCA=mlx5_0,mlx5_1,mlx5_2,mlx5_3,mlx5_4,mlx5_5,mlx5_6,mlx5_7,mlx5_8 -x NCCL_NET_GDR_LEVEL=2 -x NCCL_IB_QPS_PER_CONNECTION=4 -x NCCL_IB_TC=160 -x NCCL_IB_TIMEOUT=22 -x NCCL_PXN_DISABLE=0 -x NCCL_MIN_CTAS=4 -x LD_LIBRARY_PATH -x PATH -mca coll_hcoll_enable 0 -mca pml ob1 -mca btl_tcp_if_include eth0 -mca btl ^openib ./build/all_reduce_perf -b 1M -e 1G -n 1000 -g 1

For the ib_write_bw slowness - have you tried pinning the processes to different cores (under the same NUMA) ?

What should I do because I have little knowledge about this or what orders am I supposed to execute, or what results am I supposed to provide for you

I executed this command, the result speed is more full, I do not know whether I executed the command has a problem

taskset -c 0 ib_write_bw -d mlx5_0 -q 5 --port 10002 --report_gbits --run_infinitely -F -n 2000
taskset -c 0 ib_write_bw -d mlx5_0 -q 5 --port 10002 --report_gbits --run_infinitely -F -n 2000 10.102.20.5

taskset -c 1 ib_write_bw -d mlx5_1 -q 5 --port 10002 --report_gbits --run_infinitely -F -n 2000
taskset -c 1 ib_write_bw -d mlx5_1 -q 5 --port 10002 --report_gbits --run_infinitely -F -n 2000 10.102.20.5