I also ran the dgx-spark-fieldiag. The final result was ‘PASS’, but the summary.json reveals that the ConnectX-7 was completely missing from the hardware inventory during the test. It seems the diagnostic ‘passed’ simply because it ignored the absent component.**
One of the features in the latest software update is “New hot-plug support for the ConnectX-7 network adapter can save up to 18W when the adapter isn’t in use” so you might be affected by this new behavior. Is the interface missing still while in use?
Hi @jasonaduclos, I want to understand the symptoms you were seeing. Did you try to use the CX7 card and it was unusable or did you just not see it in the lspci output?
Hi @aniculescu, the unit was a “new” (open box) unit. I was setting it up for the first time, did not see it in the lspci output.
After reading the first comment about the hot plug feature; I setup the second unit (which was new / unopened) and did a full cold boot on the “open box” unit. When I connected the 2 units, they worked without issue.
They have actually been working over night. nvidia smi has been showing 90% GPU-Util with minimal power usage (~40W) and GPU Temp stying under 60C. They are relatively quiet little machines too!