System Information:
-
Hardware: ASUS Ascent GX10 (DGX Spark)
-
OS: NVIDIA DGX OS 7.2.3 (Base) updated to latest Kernel
-
BIOS Version: 0102
-
EC Version: 2.75.3.3
-
TPM Status: Enabled (SHA384 Bank Active,
/dev/tpm0visible)
Issue Description: The system is permanently stuck in Hardware Safety Mode, capping the total power draw at ~30W regardless of the workload. GPU temperatures never exceed 55°C under full synthetic load (gpu-burn or PyTorch GEMM). Despite connecting the 240W Power Adapter to the correct PD-IN port, the Power Delivery controller seems unable to negotiate the High Power state (20V/5A+ or 28V).
I just found conclusive evidence in dmesg showing that not only the GPU, but the entire PCIe subsystem is power-starved. The Mellanox (mlx5) network controllers are reporting “insufficient power” and seem capped around 27W, which aligns perfectly with the total system draw of ~30W I am observing.
**Does this log confirm that the PMIC (Power Management IC) is actively restricting the entire main rail due to the PD negotiation failure?
**
hasan@gx10-4b66:~/Documents/ft_qwen3_vl$ sudo dmesg | grep -iE “limit|throttle|power”
[ 1.107646] pci 000f:01:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 000f:00:00.0
[ 2.555611] mlx5_core 0000:01:00.0: mlx5_pcie_event:326:(pid 165): Detected insufficient power on the PCIe slot (27W).
[ 3.031528] mlx5_core 0000:01:00.1: mlx5_pcie_event:326:(pid 12): Detected insufficient power on the PCIe slot (27W).
[ 3.509043] mlx5_core 0002:01:00.0: mlx5_pcie_event:326:(pid 396): Detected insufficient power on the PCIe slot (27W).
[ 3.983977] mlx5_core 0002:01:00.1: mlx5_pcie_event:326:(pid 12): Detected insufficient power on the PCIe slot (27W).
