Hi,
I connected two DGX Spark nodes directly using a QSFP cable (Amphenol njaakk-n911) and followed the NVIDIA NCCL / RoCE setup guide.
Configuration:
- Direct QSFP connection between the Sparks
- Interfaces:
- Node1: enp1s0f1np1 → 169.254.246.117
- Node2: enp1s0f1np1 → 169.254.224.160
- MTU tested with 1500 and 9000
- Jumbo ping works
- ethtool shows:
- Speed: 200000Mb/s
- Link detected: yes
- PCIe link:
- 32GT/s x4
NCCL works and uses IB/RoCE:
- NCCL INFO Using network IB
- NCCL INFO NET/IB
However, performance is very low:
- NCCL all_gather_perf:
- Avg bus bandwidth: ~2.8 GB/s
- iperf3:
- ~13-16 Gbps
- ib_write_bw:
- ~12.7-13.5 Gbps
I also tested:
- larger buffers
- multiple QPs
- second P2P-visible interface
- separate /24 addressing
- MTU 9000
Results remain around ~13 Gbps.
There are no CRC errors, and RDMA counters increase normally.
Is this expected on DGX Spark / GB10, or should I be seeing much higher throughput (~90-100 Gbps+) like other reported Spark tests?
Could this be related to:
- CX-7 multi-host mode,
- wrong PF/interface selection,
- cable compatibility,
- firmware/driver issue,
- or some missing RoCE configuration?
Thanks.