I’m getting these messages and need help decoding vendor_error_syndrome and hw_error_syndrome values:
587496.823345] mlx5_0/1: QP 65526 error: unrecognized status (0x22 0x0 0x95)
[587496.833350] mlx5_0/1: QP 63410 error: unrecognized status (0x22 0x0 0x95)
[587496.843191] mlx5_0/1: QP 66547 error: unrecognized status (0x22 0x0 0x95)
It seems that completion error was returned with wrong index. It might be due to Error CQE buffer is corrupted.
I suggest you to add more debug prints, for example to dump error CQE.
For further debug, it would require a support case.
Can you be specific? what index are you referring to? wr_id? Adapter is writing CQE so how can it be corrupted? Can you give me the exact meaning for 0x22 and 0x95?
In addition to 0x22 and 0x95, there is also this one.
[587500.402815] mlx5_1/1: QP 72315 error: unrecognized status (0x23 0x0 0x9d)
What’s 0x23 and 0x9d?