Hi everyone,
I followed the playbook on the NVIDIA guide (Try NVIDIA NIM APIs), but am noticing quite a bit slower bandwidth. I am currently at 41% utilization
export PORT_NAME=enp1s0f0np0
export UCX_NET_DEVICES=$PORT_NAME
export NCCL_SOCKET_IFNAME=$PORT_NAME
export OMPI_MCA_btl_tcp_if_include=$PORT_NAME
export DEVICE_1_IP=169.254.155.221export DEVICE_2_IP=169.254.174.230
mpirun -np 2 -H $DEVICE_1_IP:1,$DEVICE_2_IP:1 –mca plm_rsh_agent “ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no” -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,NET $HOME/nccl-tests/build/all_gather_perf -b 16G -e 16G -f 2
Results:
# nccl-tests version 2.17.6 nccl-headers=22803 nccl-library=22803
# Collective test starting: all_gather_perf
# nThread 1 nGpus 1 minBytes 17179869184 maxBytes 17179869184 step: 2(factor) warmup iters: 1 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 96122 on prior-node device 0 [000f:01:00] NVIDIA GB10
# Rank 1 Group 0 Pid 28364 on posterior-node device 0 [000f:01:00] NVIDIA GB10
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
17179869184 2147483648 float none -1 839441 20.47 10.23 0 836266 20.54 10.27 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 10.2523
#
# Collective test concluded: all_gather_perf
When I ran with NCCL_DEBUG I see:
NCCL INFO NET/IB : GPU Direct RDMA Disabled for HCA 0 ‘rocep1s0f0’
NCCL INFO NET/IB : GPU Direct RDMA Disabled for HCA 1 ‘rocep1s0f1’
NCCL INFO NET/IB : GPU Direct RDMA Disabled for HCA 2 ‘roceP2p1s0f0’
NCCL INFO NET/IB : GPU Direct RDMA Disabled for HCA 3 ‘roceP2p1s0f1’
ibdev2netdev
rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Up)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Up)
How do I increase my bandwidth utilization? I would have figured enabling GDR, but with the new unified CPU-GPU architecture, I am not sure if that is required.
Thank you