CX5 - bad system state

I’m working with Xilinx Petalinux on a Xilinx PG213 core as root complex, so in general, there is no confidence in the HW or SW.

CX5 gets pretty far along before it fails with:

[ 4.447417] pci 0000:01:00.0: calling mellanox_check_broken_intx_masking+0x0/0x168

[ 4.454965] mlx5_core 0000:01:00.0: runtime IRQ mapping not provided by arch

[ 4.462017] mlx5_core 0000:01:00.0: enabling device (0000 → 0002)

[ 4.468151] mlx5_core 0000:01:00.0: enabling bus mastering

[ 4.473941] mlx5_core 0000:01:00.0: firmware version: 16.22.1002

[ 4.700002] mlx5_core 0000:01:00.0: mlx5_cmd_check:710:(pid 1710): MANAGE_PAGES(0x108) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x4e2106)

[ 4.713926] mlx5_core 0000:01:00.0: give_pages:311:(pid 1710): func_id 0x0, npages 14972, err -5

[ 4.742890] mlx5_core 0000:01:00.0: failed to allocate init pages

Any clues on if this points to a HW problem? Or a SW problem?

Found the syndrome on:

Mellanox error syndrome lists · GitHub https://gist.github.com/lukego/8b4e567f4d5c4b60da6545e063888391

BAD_SYS_STATE | 0x4E2106 | manage pages: failed to read io or write host mem