DGX Spark direct QSFP connection only getting ~13-16 Gbps instead of expected 200G performance

Hi,

I connected two DGX Spark nodes directly using a QSFP cable (Amphenol njaakk-n911) and followed the NVIDIA NCCL / RoCE setup guide.

Configuration:

  • Direct QSFP connection between the Sparks
  • Interfaces:
    • Node1: enp1s0f1np1 → 169.254.246.117
    • Node2: enp1s0f1np1 → 169.254.224.160
  • MTU tested with 1500 and 9000
  • Jumbo ping works
  • ethtool shows:
    • Speed: 200000Mb/s
    • Link detected: yes
  • PCIe link:
    • 32GT/s x4

NCCL works and uses IB/RoCE:

  • NCCL INFO Using network IB
  • NCCL INFO NET/IB

However, performance is very low:

  • NCCL all_gather_perf:
    • Avg bus bandwidth: ~2.8 GB/s
  • iperf3:
    • ~13-16 Gbps
  • ib_write_bw:
    • ~12.7-13.5 Gbps

I also tested:

  • larger buffers
  • multiple QPs
  • second P2P-visible interface
  • separate /24 addressing
  • MTU 9000

Results remain around ~13 Gbps.

There are no CRC errors, and RDMA counters increase normally.

Is this expected on DGX Spark / GB10, or should I be seeing much higher throughput (~90-100 Gbps+) like other reported Spark tests?

Could this be related to:

  • CX-7 multi-host mode,
  • wrong PF/interface selection,
  • cable compatibility,
  • firmware/driver issue,
  • or some missing RoCE configuration?

Thanks.

I can’t directly help you (yet, I’m still waiting on a second GB10 to be delivered), but I do notice that you intertwined GBps and Gbps in your post (1GBps = 8Gbps), that being Gigabytes per second vs Gigabits per second. Might be best to stick to just one. I think your maximum expectations should probably be about 180Gbps or 22.5GBps in a iperf3 multiple concurrent UDP streams test.

What are you testing with? Maybe a dumb question from me but, if you’re testing file-transfer you’ll be limited by NVME drive which might be around that speed.

You need to test memory-to-memory reads :)

Hello,

Make sure both your nodes are fully updated:

sudo apt update
sudo apt dist-upgrade
sudo fwupdmgr refresh
sudo fwupdmgr upgrade
sudo reboot

I don’t know if you really have DGX Spark really (Assuming so) but if it’s another vendor’s GB10:

sudo fwupdmgr enable-remote lvfs-testing
sudo fwupdmgr refresh --force
sudo fwupdmgr update

in addition to that

sudo shutdown -h now
# unplug USB-C power from the back of both Sparks and unplug bricks from the outlet
# wait ~5 minute
# plug back in and boot both

Make sure you connect connectX-7 cage1 with cage1 or cage2 with cage2 on both GB10 devices, don’t do different ports on both devices. If after all of this is not working, and you’re confused still, use sparkrun to configure the network for you:

other thread references:

Is there some way to fix them and make the Detected insufficient power on the PCIe slot errors go away?

It doesn’t appear to affect anything in practice.

For example, if you inspect one of the ConnectX ports, you’ll notice that the reported PCIe slot power limit is 0W.

From lspci -vvv:

DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W

That likely triggers the driver warning because a standard full x16 PCIe slot is expected to provide up to 75W, and the driver may assume that behavior. In this case, though, the port is only connected as x4, and lspci does not show any declared slot power limit. So the warning seems to be caused by missing or unusual platform-reported power metadata rather than an actual functional power issue.

Yes, the insufficient power message has been brought up in various thread and is apparently spurious and benign, and can be ignored.

Hi all,

Thank you for your interest in this topic.

While searching the forum, I found discussions mentioning compatibility issues between kernel 6.17.0 and ConnectX-7 firmware. NVIDIA had apparently released a firmware update to address this.

When I checked my DGX Spark systems, one node was already running the firmware version mentioned in those discussions, while another node had an even newer version. I upgraded all systems to the latest firmware version available to me (1.108.20 ).

After the upgrade, two Sparks were finally able to communicate at around 100Gbps according to iperf:

[SUM]   0.00-30.00  sec   388 GBytes   111 Gbits/sec    0             sender
[SUM]   0.00-30.01  sec   388 GBytes   111 Gbits/sec                  receiver

I then expanded the setup to a 3-node ring topology following the official documentation:

1. Node1 (Port0) -> Node2 (Port1)
2. Node2 (Port0) -> Node3 (Port1)
3. Node3 (Port0) -> Node1 (Port1)

Strangely, after creating the ring topology, iperf bandwidth dropped again to around 10Gbps even though all firmware and software versions were up to date.

I checked dmesg and found these messages:

Detected insufficient power on the PCIe slot (27W)
AER: Multiple Uncorrectable (Fatal) error

I understand that the “insufficient power” warning is expected/known on these systems, but the fatal AER error looked concerning.

What eventually fixed the issue was:

  • shutting down all Sparks,
  • unplugging power and QSFP cables,
  • waiting a few minutes,
  • pressing the power button while unplugged to discharge residual power,
  • reconnecting everything and booting again.

After doing this, iperf returned to ~100Gbps and the fatal AER errors disappeared. The “insufficient power” warnings are still present, but performance has remained stable for about 7 hours now.

At this point I am not fully sure what originally caused the 10Gbps behavior after enabling the ring topology, but for now the cluster appears stable and operating correctly.

I hope this information helps someone else facing a similar issue.

27W is a normal warning until the hot plug detects the cables, the system keeps thems in a low power state.

The issue with unplugging and holding the power button for a full drain is to clear another issue a CPU low power state that can happen if/when your system runs into OOM crashes, and of course if the CPU is running in low power mode it can’t chunk data at the ConnectX-7 port .

On DGX Spark there are no PCIe slots. All devices are connected directly to the SoC. If you run lspci -vv and grep for SlotPowerLimit all are 0W, meaning there’s no physical PCI slot and not a power limit issue. I think the firmware sets SlotPowerLimit=0 causing the driver to report an insufficient power detection.

while you are right there are’n’t slots there are PCIe lanes to the ConnectX-7 Ports and the firmware absolutely does power limits on them. It what the hotplug fix is for is to keep the connectX-7 ports in a low power state when there are no active connections in the port.