Could anyone suggest an appropriate cable to link two Sparks? They will sit on top of each other, so I’d prefer a short cable.
FS.com sells a 0.5m (2ft) NVIDIA/Mellanox MCP1650-V00AE30 that’s on the list of approved devices from NVIDIA.
Ref:
While waiting for my Mellanox cable — you can connect two Sparks over ethernet and have NCCL work just fine. It’s slow of course, but it’s fast enough to prototype with a relatively small model.
PNY DGX Spark Stacking Cable
Mfr Part# NJAAKKR-0006 (30 AWG)
Mfr Part# NJAAKK-0006 (32 AWG)
You can find them at MicroCenter at $99.00 . Mine purchased at Micro Center was made by Amphenol:
https://www.amphenol-cs.com/product/njaakr0006.html
https://www.amphenol-cs.com/product/njaakk0006.html
Other than the wire gauge they have the same specs - at 0.5m either should be fine.
There are two approved cable SKUs
-
Amphenol: NJAAKK-N911 (QSFP to QSFP112, 32AWG, 400mm, LSZH) https://www.amphenol-cs.com/product/njaakk0006.html (0.5m version)
-
Luxshare:- LMTQF022-SD-R (QSFP112 400G DAC Cable, 400mm, 30AWG)
Warning: this is not a good reference. You cannot use any random cable from this list. I bought the MCP1650-V00AE30 and it does not work.
I just ordered the NVIDIA/Mellanox MCP1650-V00AE30 Compatible 0.5m (1.6ft) 200G QSFP56 Passive Direct Attach Copper Twinax Cable 30AWG for DGX Spark Dual-System Interconnect from NADDOD. It is advertised as being compatible and they tell me they test each cable before shipping. I will let you know if it works.
I attended the Nvidia GTC in Washington last week and nobody–I mean nobody–had these cables for sale at the show, although a Dell guy tried to see if he could sell me one of their extras [but no]. An Nvidia guy I talked to said they had underestimated the demand and had only a single source.
There are a wide range of prices, too. The one I list above is $66 but shipping is $25 and I think it’s coming from China. Perhaps I will get a customs bill as well. But I ordered it yesterday and just got a notice from FedEx.
Here’s the link: Naddod cable
The cable arrived yesterday and it passed the nccl-test suite. Speed is 20 - 22 GB (not Gb) per second which is about the most we can expect. So far, seems like a good buy.
Since the cables are out of stock at Microcenter and Amazon in addition to Naddod I found availability at FX for $88 with free shipping from the US arriving next week. They also FedEx internationally.
Also ran into this one from Complete Connect for the UK/ Europe-based folks £88
https://www.completeconnect.co.uk/product/400g-dac-qsfp112-cables-for-nvidia-passive/
I wonder if any of these alternative cables are capable of achieving 200G on Sparks? The “official” cable is 112G/lane so that would give up to 400Gb/s.
Can anyone aggregate virtual ports and test throughput? I was going to test it myself, but since our Microcenter ran out of officially supported cables, I postponed my purchase until someone can confirm that 200G is achievable (or at least closer to it).
Naddod are selling, or more like scalping Sparks with that price, so I assume the cable is the same one from the NVIDIA bundle. FS had the same cable specks.
I thought we couldn’t push to 400G aggregate that we maxed at 200G.
I saw 400G mentioned and bookmarked a post to merge ports so 1 cable would work and will test the port merge tomorrow and the second cable between 2 sparks next week.
Wow, Spark is hard to justify at $4K, no way I would buy at that price :)
The cable is still $95 with 10-day shipping but there’s not enough desperation for anyone to pay that DGX Spark price markup, specially with partners selling units at discount rates with 0% APR payment offers.
Well, a direct equivalent to the “official” stacking cable seems to be this one: NVIDIA/Mellanox Compatible 0.5m (1.6ft) 400G QSFP112 Passive Direct Attach Cable - NADDOD
@eugr @PrinceHal got that cable and said it worked. It might be worth testing whether it can carry 400G or not.
Thanks for sharing the direct equivalent!
@PrinceHal how’s the cable working for you so far?
Found another one on FS.com - @NVES - could you please confirm if this cable specs match the official one, as the official one is pretty much unobtanium now? https://www.fs.com/products/149312.html?attribute=26053&id=3713425
EDIT: no, this one is not it. It’s QFSP-DD, not QFSP-112, so won’t be compatible.
Still, a question to @NVES and other NVIDIA folks: besides those two “official” SKUs, what exact specs are needed to achieve 200Gbps on a single port? Is QFSP56 enough? Or does it have to be QFSP112, like the “official” ones? I would assume that QFSP56 ones would work, but wanted to double check, as Spark is not exactly a standard ConnectX 7 configuration…
The NADDOD cable seems to work fine. Now I have to learn how to use it, as in run vLLM with larger models using the two linked Sparks, or fine tune a larger model or quantize a larger model with 256GB combined RAM
Have you had a chance to aggregate two “virtual” ports and achieve 200 Gbps between two sparks in benchmarks?
I have two Sparks connected by one cable. I followed Nvidia’s directions in Spark Clustering and Stacked Sparks. Running the nccl-test script from the Stacked Sparks link I obtained 20 - 22 GB / sec transfers. Take 8 bits in a byte and allow for other overhead and that’s within reasonable expectations for a 200Gbit link. I hear it might be possible to bind two links to double the speed but would need another cable to try. When I hear for sure that people have done it I might well order another cable. I hope that answers your question (I use “bind” where you use “aggregate virtual ports” perhaps).
