Hi all,
I’m running our ibverbs-based program to communicate over infiniband. Every once a while I got this error on one of the nodes:
ibv_get_async_event: dev:mlx4_0 evt: LID change
When this happens, the peer nodes will report:
ibv_get_async_event: dev:mlx4_0 evt: client reregistration
Once this happens, the IB connection starts to show problems and eventually shutdown.
We are using MT25408 ConnectX-3 QDR NIC, Infiniscale-IV QDR switch.
Thanks!
-Shawn