Hello NVIDIA team,
I am working with two DGX Spark Mini systems and am encountering a persistent issue where the ConnectX‑7 QSFP ports never power on and the nodes cannot establish a link. In addition, the diagnostic tool recommended by NVIDIA support (dgx-spark-fielddiag) cannot be installed because it does not appear in any available repository.
I am requesting engineering guidance on both issues.
1. QSFP Ports Never Power On / No Link Between Nodes
Both DGX Spark Minis show identical behavior:
ConnectX‑7 NIC enumerates correctly in PCIe (lspci shows the device).
mlx5_core loads without errors.
Firmware version is visible.
Cable insertion/removal events appear in dmesg.
QSFP cages never power up.
No network interfaces (p7p1, mlx5_0, etc.) appear in ip link.
Link state remains DOWN at all times.
Key repeated message in logs:
Detected insufficient power on the PCIe slot (27W)
QSFP module not powered
Port module: cable unplugged
This occurs even when the cable is fully inserted.
Cable used
Amphenol NJAAKK‑N911 (0.4m).I understand this is not an NVIDIA‑qualified QSFP112 cable, but even with an unsupported cable, the QSFP cage should still power on if PCIe power delivery is correct.
Because both DGX Spark Minis show the same 27W PCIe power limit and identical QSFP behavior, this appears to be a platform‑level PCIe power delivery issue rather than a single faulty NIC.
2. Diagnostic Tool Requested by NVIDIA Support Cannot Be Installed
NVIDIA support requested that I run the DGX Spark diagnostic tool (dgx-spark-fielddiag).However, after restoring APT sources and running:
sudo apt update
sudo apt install dgx-spark-fielddiag
APT returns:
E: Unable to locate package dgx-spark-fielddiag
Repository state
The only DGX-related repo present is dgx.sources in /etc/apt/sources.list.d/.
This repo provides packages such as dgx-repo, dgx-spark-mlnx-hotplug, dgx-spark-oobe-customize, etc.
dgx-spark-fielddiag is not present in this repository.
Attempting to use the URL:
https://developer.download.nvidia.com/dgx/repos/spark/ubuntu
results in:
The repository does not have a Release file
and APT disables it.
This suggests the diagnostic tool is part of the DGX Spark OS factory image or a private repository, not the public DGX repo. Since this system no longer has the factory OS image, I cannot install the diagnostic tool required for your troubleshooting workflow.
3. Request for Engineering Guidance
I need assistance with the following:
Can a 27W PCIe power limit on DGX Spark Mini prevent the ConnectX‑7 QSFP112 cage from powering on?
Should the Spark Mini supply the full PCIe power budget required for QSFP112 modules?
Is there a BIOS, firmware, or platform configuration required to enable full PCIe power?
Does this behavior indicate a hardware issue with the DGX Spark Mini motherboard or PCIe power delivery?
How can I obtain the DGX Spark Mini OS recovery image or the private repository that contains dgx-spark-fielddiag so I can run the diagnostics you requested?
I can provide full dmesg, lspci -vv, NIC firmware logs, and system snapshots if needed.
Thank you — I appreciate any guidance from the DGX Spark engineering team to determine whether this is a platform power issue, firmware issue, or hardware fault, and how to restore the diagnostic environment.