knaidu
December 2, 2025, 10:05pm
1
I see PCIe power related errors (all 4 connectx7 NICs) at every boot up as well as very frequent correctable errors. Has anyone seen these on your systems and whether these are to be ignored or these point to a hardware issue.
mlx5_core 0000:01:00.0: Detected insufficient power on the PCIe slot (27W)
Type: Correctable, Physical Layer, Receiver ID
Error Status: 0x00000001 (Bit 0 = RxErr)
Device: [10de:22ce] NVIDIA PCIe Root Complex
There was also a uncorrectable fault once because of which the system rebooted to fix the issue.
pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal),
type=Transaction Layer, (Requester ID)
eugr
December 2, 2025, 11:12pm
8
Same here - doesn’t seem to affect anything though.
eugr@spark2:~/spark-env$ sudo dmesg | grep mlx5_core
[sudo] password for eugr:
[ 2.172114] mlx5_core 0000:01:00.0: Adding to iommu group 14
[ 2.174699] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)
[ 2.174831] mlx5_core 0000:01:00.0: firmware version: 28.45.4028
[ 2.174853] mlx5_core 0000:01:00.0: 126.028 Gb/s available PCIe bandwidth (32.0 GT/s PCIe x4 link)
[ 2.521947] mlx5_core 0000:01:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
[ 2.522462] mlx5_core 0000:01:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 2.523569] mlx5_core 0000:01:00.0: Flow counters bulk query buffer size increased, bulk_query_len(8)
[ 2.536146] mlx5_core 0000:01:00.0: mlx5e: IPSec ESP acceleration enabled
[ 2.542249] mlx5_core 0000:01:00.0: Port module event: module 0, Cable unplugged
[ 2.542860] mlx5_core 0000:01:00.0: mlx5_pcie_event:301:(pid 165): Detected insufficient power on the PCIe slot (27W).
[ 2.650297] mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[ 2.652235] mlx5_core 0000:01:00.1: Adding to iommu group 15
[ 2.655314] mlx5_core 0000:01:00.1: enabling device (0000 -> 0002)
[ 2.655452] mlx5_core 0000:01:00.1: firmware version: 28.45.4028
[ 2.655473] mlx5_core 0000:01:00.1: 126.028 Gb/s available PCIe bandwidth (32.0 GT/s PCIe x4 link)
[ 3.006073] mlx5_core 0000:01:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
[ 3.006590] mlx5_core 0000:01:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 3.007911] mlx5_core 0000:01:00.1: Flow counters bulk query buffer size increased, bulk_query_len(8)
[ 3.023174] mlx5_core 0000:01:00.1: mlx5e: IPSec ESP acceleration enabled
[ 3.026243] mlx5_core 0000:01:00.1: Port module event: module 1, Cable plugged
[ 3.026616] mlx5_core 0000:01:00.1: mlx5_pcie_event:301:(pid 11): Detected insufficient power on the PCIe slot (27W).
[ 3.140648] mlx5_core 0000:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[ 3.143176] mlx5_core 0002:01:00.0: Adding to iommu group 16
[ 3.146107] mlx5_core 0002:01:00.0: enabling device (0000 -> 0002)
[ 3.146251] mlx5_core 0002:01:00.0: firmware version: 28.45.4028
[ 3.146272] mlx5_core 0002:01:00.0: 126.028 Gb/s available PCIe bandwidth (32.0 GT/s PCIe x4 link)
[ 3.496129] mlx5_core 0002:01:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
[ 3.496647] mlx5_core 0002:01:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 3.497736] mlx5_core 0002:01:00.0: Flow counters bulk query buffer size increased, bulk_query_len(8)
[ 3.510311] mlx5_core 0002:01:00.0: mlx5e: IPSec ESP acceleration enabled
[ 3.516645] mlx5_core 0002:01:00.0: Port module event: module 0, Cable unplugged
[ 3.517214] mlx5_core 0002:01:00.0: mlx5_pcie_event:301:(pid 165): Detected insufficient power on the PCIe slot (27W).
[ 3.617811] mlx5_core 0002:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[ 3.619225] mlx5_core 0002:01:00.1: Adding to iommu group 17
[ 3.622205] mlx5_core 0002:01:00.1: enabling device (0000 -> 0002)
[ 3.622369] mlx5_core 0002:01:00.1: firmware version: 28.45.4028
[ 3.622391] mlx5_core 0002:01:00.1: 126.028 Gb/s available PCIe bandwidth (32.0 GT/s PCIe x4 link)
[ 3.970440] mlx5_core 0002:01:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
[ 3.970958] mlx5_core 0002:01:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 3.972270] mlx5_core 0002:01:00.1: Flow counters bulk query buffer size increased, bulk_query_len(8)
[ 3.988272] mlx5_core 0002:01:00.1: mlx5e: IPSec ESP acceleration enabled
[ 3.991896] mlx5_core 0002:01:00.1: Port module event: module 1, Cable plugged
[ 3.992259] mlx5_core 0002:01:00.1: mlx5_pcie_event:301:(pid 11): Detected insufficient power on the PCIe slot (27W).
[ 4.094650] mlx5_core 0002:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[ 4.098598] mlx5_core 0000:01:00.0 enp1s0f0np0: renamed from eth0
[ 4.098977] mlx5_core 0002:01:00.0 enP2p1s0f0np0: renamed from eth2
[ 4.099990] mlx5_core 0000:01:00.1 enp1s0f1np1: renamed from eth1
[ 4.100506] mlx5_core 0002:01:00.1 enP2p1s0f1np1: renamed from eth3
[ 8.116683] mlx5_core 0002:01:00.0 enP2p1s0f0np0: Link down
[ 8.355688] mlx5_core 0002:01:00.1 enP2p1s0f1np1: Link up
[ 8.577558] mlx5_core 0000:01:00.0 enp1s0f0np0: Link down
[ 8.781281] mlx5_core 0000:01:00.1 enp1s0f1np1: Link up
elsaco
December 3, 2025, 12:29am
9
If you check one of the ConnectX ports you’ll notice the power limit is 0W
From lspci -vvv:
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
That spooks the driver, since a full x16 PCIe slot would provide a 75W of power and the driver expects that. Keep in mind that the port is connected to a x4 only and there’s no declared power limit, at least according to the lspci output.
Thank you for bringing this is up. We will investigate this issue
elsaco’s explanation is correct, since the host does not advertise power limit it triggers this warning.