Avoid ASUS GX10s If You Want Any Support

So I’ve had an on going problem with my ASUS GX10s, all 4 of them, none of them will activate their ConnectX-7 Ports. 5 weeks since the initial report of problems including detailed memory maps, field-diag, reports, re-images, etc.. All done by me trying to trouble shoot the problem and the entire time all I’ve gotten from Asus Support is thanks for your messages we will follow-up in 1-2 business days. i did get a couple of messages asking me to run MS Windows diagnostic tools, repeated requests that I provide video evidence of the problem(WTF), repeated requests for screenshots of my BIOS and Firmware information, AKA Level 1 support run around.

Today they informed me "This is an Nvidia firmware problem not an ASUS problem you’ll need to open a ticket with Nvidia. Now to me this is like Ford telling me to contact Bosch to solve a problem with a sensor in my car and a complete abdication of their responsibility to me as the customer.

Then later they requested I ship them my GX10s so they can troubleshoot them, even though they said their product team had confirmed it’s a problem across the board. So what do they need my GX10’s for. I asked them to verify they have GX10s that actually work and RMA my units for working units and they again stated that would not solve my problem but I should send them my GX10s for troubleshooting.

Net-net I would not advise purchasing GX10s from ASUS if you’re bent on a GB10/Spark Clone go to anyone but ASUS, because their support is awful.

Dunno about 4x GX10, but my 2 GX10 talk fine over QSFP56 using the nvidia playbook for setting up networking.

Maybe try a qsfp56 cable instead of qsfp112 from a known-good vendor? I don’t know otherwise.

Have tried qsfp56 as well but the issue is the firmware ejects the connectX-7 cards from the system so they can’t even sense the cable plug in :(

there was a new firmware update yesterday I think… give it a shot? It’s not posted on their support site yet, but it prompted me when I logged in.

I got hit by the ridiculous power-capping bug which annoyed the hell out of me. Hoping it fixes that permanently.

Yeah firmware didn’t fix it :( and sadly I think I saw a few people saying they still got the powercap bug :((

how do we fix the powercap issue?

Unplug for a minute and plug back in. Some issue negotiating with the AC adapter/usb-c

More then 1 minute. Say 5.

Are your cables definitely of correct type? I have a pair of GX10s and have had no issues at all with clustering them.

I have a ASUS GX10 as well. I didn’t connect 4 of them, but I did bought a “compatible” cable and run a couple of loop back test . and it works, including data transfer from 1 port across the cable to the other port. So as others have mentioned, it might be a cable instead of the ASUS.

Same here, 0 issues and I have run a loads for synthetic data generations for weeks.

So 2 Different Amphenol and 4 different Naddod, cables, product numbers all confirmed to be correct product numbers, 2 of the Naddod shipped back and tested by Naddod as good on working sparks and the last 2 confirmed by Naddod on Sparks before they shipped out. And all 4 GX10s exhibiting the exact same problems is either a mfg problem or driver problem

My biggest issue isn’t whose fault it is but rather ASUS’ spending 5+ weeks running me around on level 1 support including support people trying to get me to run Windows Diags on a Linux Box and repeated requests for screen shots of bios etc…Only to then have them punt to say open a ticket with Nvidia rather than ASUS who mfg’d and sold the unit’s opening a ticket with Nvidia or ever even getting me to someone besides level one support. Thus my If you want support framing of the post

Let me suggest a blue sky thinking approach: Install Claude cli on DGX Spark and ask it to debug the problem for you. It is very likely a hardware problem and you got a bad unit.

The downside is that you have to pay 20 bucks for this.

Also confirming no problems at the moment between the gold PNY model and GX10. I’m using an Amphenol cable.

I did encounter the 16GB/sec bug, as others have mentioned. The fix for the GX10 is firmware version 0x3000006. This got me back to 24GB/sec between the two systems.

According to this, as of today there’s actually a 0x3000007 available:

fwupdmgr requires you to specifically allow these unstable builds, otherwise they won’t show up during the normal check:

sudo fwupdmgr enable-remote lvfs-testing

Claude and Gemini and ChatGPT all deepest thinking modes down to probing at the hardware layer say it’s a firmware driver issue, which is what ASUS “support” said it was :(

Yeah gold PNY is the amphenol cable I have tested with :(

I’ll give the fwupd with the testing flag a try, couldn’t hurt

What’s the Amphenol part number of the cable?

NJAAKK-N911

I was looking at getting the Asus GX10 and eventually get a 2nd for a stack. Curious of manufacturing dates of ones that are working in a stack and ones that are not.

I think we all feels your pain, and hence just trying to throw ideas, and understand the frustration. I’ve never work as a first level support but I could also where they are coming from. The hardware and firmware came from Nvidia and Nvidia probably have rules in place about what any of the vendor could do (guessing) and Nvidia firmware does strict hardware checks. So the issue possibly isn’t that there is a firmware bug but rather your cables are failing on the firmware checks. I’m not sure if any cable vendor would have the REAL servers or workstations (any or all brands and models) to do cable testing. Anyway, find someone nearby who has a working pair in use, and you can confirm once and for all.