Can a DGX Spark and AGX Thor connect using QSFP cables?

Hello! Can a DGX Spark and an AGX Thor be connected successfully with a QSFP DAC cable? As a test when I plug the cable into the Spark’s 2 QSFP ports, the 4 enp1s0fXnpX ints that I manually assigned IPv4 /31 addrs to come up and are pingable. Similarly, when I plug one end into the AGX Thor’s one QSFP port, its 4 mgbeX_0 ints with the matching /31 subnet addrs come up. But then the Spark’s ints go down, and while the Spark command eg “sudo ip link set dev enp1s0f0np0 up” succeed, the Spark’s ints stay down. I know the Spark and the Thor are developed by different divisions of the company, but I am hoping there is a way they can be successfully connected with the DAC cable into a cluster. Thanks, Dan

Not sure I understand your issue. Are you connecting them directly?
When you say “pingable”, do you mean that you can pink Spark from Thor? Or is it pingable on the same machine?
What do you mean “Spark ints go down”? When you shut down the machine, or when you shutdown the interface using “ip link”?

Do you assign addresses manually on Thor too?

When you assign addresses, do you save it in the system configuration, or you do it via “ip link”?

Let’s not get confused by the IPv4 addressing, which works fine. The issue is that the DGX Spark and AGX Thor aren’t designed to automatically talk to each other over the QSFP DACs, just to another of the same type. With some commands, can they talk successfully?

Thanks, Dan

Not sure I understand what you are saying.
If they can ping each other by IPs that you assigned to their QFSP interfaces, they can talk.

When you have both connected, what does ibdev2netdev show on your Spark?
What about ibstat?

They can be connected with QSFP28 (Thor side) to QFPF56 (Spark side). It’s just a Ethernet link! But you’ll be limited to 100Gb because of the Thor. All you need is a QSFP28-to-QSFP56 cable. The Thor uses the HSB (Holoscan Sensor Bridge) a RDMA replacement and not sure if it will work with a Spark. Probably not. The Green Team can elaborate more. Paging @aniculescu !

A likely issue is QSFP speed and adapter compatibility. On DGX Spark, QSFP ports (ConnectX-7) are optimized for validated use cases and QSFP-to-SFP(+) adapters are not officially tested, with reports of the Spark not detecting links unless very specific speeds and settings are forced . On the Jetson AGX Thor side, the QSFP port defaults to 10 GbE and must be explicitly reconfigured for 25 GbE, supporting only one speed at a time via BSP/kernel configuration . If speeds, lane modes, or optics don’t match exactly on both ends, the link simply won’t come up.

References:

In practice, direct Spark<>Thor QSFP cabling is unlikely to work reliably. The proven approach is to use a switch that explicitly supports 10 GbE or 25 GbE Ethernet on both sides, with validated optics

1 Like

The spec sheet for Thor lists 1x QSFP28 (4x 25 GbE) with a max port speed of 100Gbps. But the software stack is what’s different even though HSB is a kind of RDMA.

Quick comparison, thanks to Ms. Gemini:

    	        Jetson AGX Thor (Native)	DGX Spark (ConnectX-7)
---------------------------------------------------------------------
RDMA / RoCE	    No (uses IEEE 1722/HSB)	    Yes (Native RoCE support)
Protocol	    Ethernet / HSB	            InfiniBand / Ethernet
Software Stack	JetPack / Holoscan	        DOCA / MLNX_OFED
Max Port Speed	100 Gbps (4x 25 GbE)	    200 Gbps

I don’t have a Thor so I can’t test it. All my Nvidia devices before the Sparks are Orin Nano

Yeah, in this case, it may still be usable over Ethernet (via a switch, I guess?), just not for RDMA or any other ultra-low latency workloads.