Hello
CFD _Solver runs intelmpi over Infiniband Mellanox -on 2 nodes ( hp apollo 230) with both rhel 79
Some times ( same computation ) crashes with messages
mlx5_core 0000:5c:00.0: mlx5_eq_async_int:390:(pid 13335): CQ error on CQN 0x45, syndrome 0x1
any hints ?
ibstat
ibstat
CA ‘mlx5_0’
CA type: MT4115
Number of ports: 1
Firmware version: 12.27.4000
Hardware version: 0
Node GUID: 0xb88303ffff79d104
System image GUID: 0xb88303ffff79d104
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 61
LMC: 0
SM lid: 1
Capability mask: 0x2659e84a
Port GUID: 0xb88303ffff79d104
Link layer: InfiniBand
CA ‘mlx5_1’
CA type: MT4115
Number of ports: 1
Firmware version: 12.27.4000
Hardware version: 0
Node GUID: 0xb88303ffff79d105
System image GUID: 0xb88303ffff79d104
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0x2659e848
Port GUID: 0xb88303ffff79d105
Link layer: InfiniBand
[root@fnx628 plugins]#