ConnectX-7 NIC in DGX Spark

sho.mayama.02211 · November 6, 2025, 11:52am

I’d like to ask about the port configuration of the ConnectX-7 NIC in DGX Spark.

On our DGX Spark, the onboard ConnectX-7 NIC appears as four ports when checking with ip -br a and lspci:

roceP2p1s0f0 port 1  ==> enP2p1s0f0np0 (Down)
roceP2p1s0f1 port 1  ==> enP2p1s0f1np1 (Up)
rocep1s0f0   port 1  ==> enp1s0f0np0    (Down)
rocep1s0f1   port 1  ==> enp1s0f1np1    (Up)

However, the chassis only has two physical ports.

If anyone is familiar with this behavior, could you help explain how the NIC port mapping/logical configuration works for ConnectX-7 on DGX Spark, or point me to any official documentation or technical references that describe this layout?

isdias · November 6, 2025, 4:48pm

This is the expected behaviour due to a limitation in the GB10 chip.

The SoC can’t provide more than x4-wide PCIe per device, so, in order to achieve the 200gbps speed, we had to use the Cx7’s multi-host mode, aggregating 2 separate x4-wide PCIe links, which combined can deliver the 200gbps speed.

As a consequence, the interfaces show 4 times, because each root port has to access both interface ports through a x4 link. For maximum speed, you can aggregate all ports, or for a single cable, aggregate enp1s0f0np0 with enP2p1s0f0np0 for instance, using balance-XOR (mode2).

More information on how to aggregate ports is available here:
NVIDIA Enterprise Support Portal | How to Configure RoCE over LAG (ConnectX-4/ConnectX-5-/ConnectX-6)

eugr · November 6, 2025, 5:24pm

Is this documentation up to date? Connect Two Sparks | DGX Spark

elsaco · November 6, 2025, 5:41pm

The mapping is easier to see by listing /sys/class/net:

enp1s0f0np0 → ../../devices/pci0000:00/0000:00:00.0/0000:01:00.0/net/enp1s0f0np0
enp1s0f1np1 → ../../devices/pci0000:00/0000:00:00.0/0000:01:00.1/net/enp1s0f1np1
enP2p1s0f0np0 → ../../devices/pci0002:00/0002:00:00.0/0002:01:00.0/net/enP2p1s0f0np0
enP2p1s0f1np1 → ../../devices/pci0002:00/0002:00:00.0/0002:01:00.1/net/enP2p1s0f1np1

The interface with a P2 in the label is just on a different bus, PCIe bus 2 in this case.

maiia · November 7, 2025, 2:09am

Has anyone been able to get it to work
I’ve gone through the automatic, manual and I’m not connected and still getting errors.
This might be one for a video walk through
If we’re wasting this much time on basics using the Sparks for training becomes an inefficient time drain

maiia · November 7, 2025, 2:32am

If you started with option 1 (auto connect) make sure you disable the address if it happens to be a different one from the manual set up. After that manual set up or the second step-by-step works.

kal5 · November 7, 2025, 8:47am

This worked for me. I tried Option 1 Automatically configure SSH. As described in the Connect Two Sparks playbook.

Andrea.Padovani · November 8, 2025, 8:22am

Hi everyone, my “two spark” kit with it’s just arrived from our partner, and I start to play with it.
ethtools confirms that links with DAC is 200G, despite i’m getting just 98.2G max when running iperf3.
Since iperf it’s a kind powerfull tool, but comes with tons of parameter and tuning option, probably I’m just missing something, so I move over.
But when I try the “NCCL for Two Sparks“ example, I got this results:

So i just want to check if it’s normal results or there is something in my 2 Kit setup.

eugr · November 10, 2025, 9:18pm

Can we get some more clarification on this? Is 200Gbps achievable via a single port/single cable connection? If so, based on your post, we have to aggregate enp1s0f0np0 with enP2p1s0f0np0, even though it’s one physical port?

It would be good if you updated documentation and this playbook for clarity, because it says 200 Gbps everywhere without mentioning aggregate or addressing the port/PCIe mapping.

isdias · November 10, 2025, 10:13pm

You’ll need to aggregate two of the 100G halves to achieve 200G with a single cable, even though they are a single physical port.

The playbook mentions that two interfaces are displayed for each physical link:

“interface showing as ‘Up’ is enp1s0f1np1 / enP2p1s0f1np1 (each physical port has two names).”

eugr · November 10, 2025, 10:33pm

Thanks for the clarification! Can someone update the playbook please?
Because right after this it says:

Please disregard enP2p1s0f0np0 and enP2p1s0f1np1, and use enp1s0f0np0 and enp1s0f1np1 only.

mnagy009 · November 13, 2025, 10:18pm

Please clarify this more.

Which interface is the “root port”, enP2p1s0f0np0? If it accesses “both interface ports”, and there are only two inteface ports, why the existence of enP2p1s0f1np1? Just one en2P… should suffice.
“aggregate all ports” means creating, say, bond0, with all 4 interfaces, or just the two enP2…ones? Because XOR mode including two interface/physical ports says transmission is in XOR fashion (one port or the other, not both) and should not be different from using a single cable with one physical port on each Spark.

Thanks!

isdias · November 14, 2025, 4:15pm

We’re working on that - you should see some updates to this playbook soon.

eugr · November 20, 2025, 1:12am

Any updates on that? Do we need to create bond0, or they are aggregated on the firmware level already?

AndrewMyers · November 20, 2025, 1:49am

Appears aggregated by default. Sampling with mlnx_perf on both enp1s0f0np0 and enP2p1s0f0np0 should show activity on both paths, even if enP2p1s0f0np0 has no IP.

elsaco · November 20, 2025, 4:15am

@mnagy009 there are only two NICs attached to the Spark, thus the two sockets. Each NIC is attached to a different PCIe root complex. Here’s lspci -t output:

-[0000:00]---00.0-[01-0f]--+-00.0
                           \-00.1
-[0002:00]---00.0-[01-0f]--+-00.0
                           \-00.1

First NIC is attached to domain 0000 and the second NIC to domain 0002. When you plugin in the cable both ports of each NIC will show link UP but you set IP on one port only for networking.

The first NIC is the left socket, the one next to the 10G port. And both NICs share the 200G bandwidth.

The PCIe mapping might be overwhelming if you’re used to the notebook/desktop settings only.

Each NIC is dual-port:

NIC 1 = 0000:01:00.[0–1]

NIC 2 = 0002:01:00.[0–1]

maiia · November 20, 2025, 4:21am

Has anyone connected the two Sparks with 2 cables and are there any pros cons from that set up? My second cable just arrived from Naddod and I’m debating whether there’s any value setting two cables up.

Edit:
Following this rabbit hole to see if 400G is actually feasible with 2 cables

Current max achieved on 2 connected cables: ~26 GB/s (208 Gbps)
Single Cable: ~200 Gbps (25 GB/s) theoretical
So the two cables hit theoretical max

Testing what would happen if I enable GPU Direct RDMA aiming for potential 2x performance

AndrewMyers · November 20, 2025, 5:06pm

I’ve got similar numbers ( 195Gbps over two links , - busbw in my case is slightly higher ~ 25GB/s )

   536870912     134217728     float    none       0  21982.6   24.42   24.42       0  23507.8   22.84   22.84       0

  1073741824     268435456     float    none       0  43953.5   24.43   24.43       0  43945.0   24.43   24.43       0

  2147483648     536870912     float    none       0  87839.0   24.45   24.45       0  88737.6   24.20   24.20       0

  4294967296    1073741824     float    none       0   175583   24.46   24.46       0   176853   24.29   24.29       0

Note, that GPU utilization during collective is 100% ( 96%, but 4% in idle ) so may be it the maximum we can get ?

AndrewMyers · November 20, 2025, 5:24pm

Can you clarify 26GB/s ? On your screenshot its 12.89GB/s

eugr · November 20, 2025, 5:50pm

Actually, you are getting 100 GBps using both cables - you need to look at busbw. To get 200 GBps with two cables you would likely need to create a bond first.

Despite your optimistic LLM telling you what you want to hear, you won’t be able to achieve 400G using two ports - when both ports are active, each will get 100G. DGX Spark doesn’t have enough PCI lanes to achieve 400G, and even to achieve 200G on a single port, they had to do some voodoo with dual NIC and bonding due to PCIe limitations on this arch.

Oh, and GPU Direct RDMA is not implemented on Spark according to NVIDIA - there were some posts from NVIDIA explaining why, and I believe it is buried somewhere in the documentation too.

I guess, my point is that DGX Spark is a very new platform, so LLMs will not know much about it => risk of hallucinations is super high.

Topic		Replies	Views
NCCL Test Bandwidth is only 3GB/s between 2 DGX Spark using QSFP cable DGX Spark / GB10 spark , nics , dgx	9	602	April 19, 2026
Suggested cable to link two Sparks? DGX Spark / GB10	76	9158	November 24, 2025
Test the sample about "Connect Three DGX Spark in a Ring Topology" DGX Spark / GB10 cuda	15	917	April 13, 2026
Dual DGX Spark RoCE Bandwidth Expectations DGX Spark / GB10	20	1073	May 14, 2026
DGX Spark direct QSFP connection only getting ~13-16 Gbps instead of expected 200G performance DGX Spark / GB10	10	766	May 14, 2026
NCCL bandwidth capped at 3 GB/s, GPU PCIe topology reports Gen1 x1 on DGX Spark FE DGX Spark / GB10 pcie , kernel , performance , debugging-and-troubleshooting , nics , rdma	5	471	April 14, 2026
DGX Spark NCCL Test: 10GB/s not 200 Gbps=25 GB/s DGX Spark / GB10	3	1047	November 5, 2025
Why is my NCCL broken? DGX Spark / GB10	25	736	February 5, 2026
ConnectX‑7 200GbE via MikroTik CRS812 + QSFP‑DD 400G → 2xQSFP56 200G breakout DGX Spark / GB10	5	1969	January 10, 2026
One of Four DGX Sparks Shows ~35% Lower NCCL Bandwidth — Can't Figure Out Why DGX Spark / GB10	49	1841	March 17, 2026

ConnectX-7 NIC in DGX Spark

Related topics