ConnectX-7 NIC in DGX Spark

Also some new information, courtesy of ServeTheHome: The NVIDIA GB10 ConnectX-7 200GbE Networking is Really Different - ServeTheHome

Getting 185-190Gbps required using RoCE and the NVIDIA Perftest tool, setting static IP addresses for the interfaces aligned to the 100G MACs on one of the QSFP cages, carefully ensuring we are obeying the topology of the system. If you mess this up on either the sending or receiving GB10, then you end up running two links through one PCIe Gen5 x4 link and get ~92-95Gbps. Generally, the RDMA_Write BW Test should give you ~0.176-0.188Mpps and 92-98Gbps. We are not sure why, but there was often a 4-6Gbps difference between the ports. We observed this regardless of the GB10 system vendor (NVIDIA, Dell, and ASUS.) Perhaps we have to do better core pinning or something, but at roughly 190Gbps combined, we are thoroughly in 200Gbps territory.

And

What became very challenging was managing the interfaces. When you have five GB10’s, each with four ConnectX-7 interfaces, that is a lot. For a given cable, only one of the possible combinations will provide you with 200Gbps of throughput. The number of times we have gotten results in the 80-100Gbps range instead of 180-200Gbps has been a lot.

1 Like