Hardware: 2x NVIDIA DGX Spark (GB10 SoC) Firmware: ConnectX-7, v28.45.4028 (current) Cable: Amphenol QSFP cable (link up at 200 Gbps / 2 lanes) Power: Official NVIDIA DGX Spark power adapters OS: Ubuntu (stock DGX Spark image, fully updated)
Hi all,
I have two DGX Sparks connected via the QSFP high-speed link. Both interfaces negotiate at 200 Gbps (confirmed via ethtool), but actual throughput is capped at approximately 13 Gbps — whether tested via TCP (iperf3) or RDMA (ib_write_bw). NVIDIA’s own Performance Benchmarking Guide shows expected RDMA bandwidth of 92–97 Gbps per interface (~190 Gbps combined).
What I’ve Tried
-
Verified traffic is on CX-7, not WiFi. Initially traffic was routing over WiFi instead of the CX-7 interfaces. After assigning static IPs via netplan to the CX-7 interfaces, traffic correctly flows over the high-speed link.
-
iperf3 testing — single stream, 16 parallel streams (
-P 16), and 4 separate iperf3 instances on different ports. All cap at ~12.5–13 Gbps total. -
RDMA testing —
ib_write_bwusing the exact commands from NVIDIA’s benchmarking guide:
# Server:
ib_write_bw -d rocep1s0f0 -F --report_gbits -q 4 -D 30
# Client:
ib_write_bw -d rocep1s0f0 -F --report_gbits -q 4 -D 30 <server-ip>
Result: 13.42 Gbps (expected: 92+ Gbps per interface, ~190 Gbps combined)
-
Ring buffers increased from 1024 to 8192 — no improvement.
-
Qdisc changed from fq_codel to mq — no improvement.
-
MTU set to 9978 (jumbo frames) on all interfaces — no improvement.
-
CPU utilization checked during test: 98.7% idle — not a CPU bottleneck.
-
Rate limiting checked via
mlnx_qos— all rates unlimited, no throttling configured. -
Firmware confirmed current: v28.45.4028 on both Sparks.
-
SWIOTLB/bounce buffers — not present, ruled out.
Root Cause Found: PCIe Power Throttling
Both Sparks show this in dmesg:
mlx5_core 0000:01:00.0: mlx5_pcie_event: Detected insufficient power on the PCIe slot (27W).
mlx5_core 0002:01:00.0: mlx5_pcie_event: Detected insufficient power on the PCIe slot (27W).
This appears on both ports of both Sparks (4 messages total). The ConnectX-7 NIC is reporting that it’s not receiving enough power from the PCIe slot, which is almost certainly causing it to throttle performance.
Both Sparks are using the official NVIDIA DGX Spark power adapters — no third-party power supplies.
PCIe Topology
The ConnectX-7 operates in multi-host mode with two PCIe Gen5 x4 root ports:
-
Domain 0000:
enp1s0f0np0/rocep1s0f0 -
Domain 0002:
enP2p1s0f0np0/roceP2p1s0f0
lspci shows: Speed 32GT/s (ok), Width x4 (ok) — PCIe Gen5 x4 is correct.
Questions
-
Is the 27W PCIe power warning a known issue on DGX Spark?
-
Is there a firmware or BIOS/UEFI update that addresses this power allocation?
-
Has anyone else achieved the expected 92+ Gbps throughput on the inter-Spark link?
-
Could this be a hardware issue requiring RMA?
Any guidance would be appreciated. Happy to provide additional diagnostic output.
System details:
-
Both Sparks fully updated (
apt update && apt upgrade) -
Firmware: ConnectX-7 v28.45.4028