Advice on ROCEv1 with X5 Network Adapter - Windows SMB Log messages

Hello,
I’m currently attempting to implement a basic ROCE v1 configuration between two identical servers that will eventually be used for a storage cluster.
Currently my setup is as follows:

Server1 - Windows Server 2019 - Mellanox/NVidia X5 NIC - Model MCX512F - driver 24.7.26520.0 - firmware 16.35.3006
Server2 - Windows Server 2019 - Mellanox/Nvidia X5 NIC - Model MCX512F - driver 24.7.26520.0 - firmware 16.35.3006

These two servers are going through an SFP28 fiber connection to a fiber switch that has a corresponding ROCE configuration in place.

Each adapter is IP’d as follows:

Server1:
X5P1 - 10.11.15.15
X5P2 - 10.11.16.15

Server2:
X5P1 - 10.11.15.16
X5P2 - 10.11.16.16

I’ve enabled all required NetQosPolicy powershell commands based on what I believe will be our eventual RDMA needs.
I’m able to successfully test RDMA / ROCE traffic with a Microsoft provided Test-Rdma.ps1 script that seemingly is able to transfer all test data workloads without any issues. I’m also able to copy large files between the two servers at great speed.

One item that I’ve noticed that has been recurring in my event logs after the transfer completes is as follows:

"RDMA connection disconnected.

Transport name: \Device\RdmaSmbIpv4_10.11.15.16
Milliseconds spent closing the connection: 0

Guidance:

Closing an RDMA connection should not take longer than 2 minutes. An RDMA IO that takes an abnormally long time to complete indicates a problem with the RDMA network adapters on this computer or its remote host. Contact your RDMA vendor for an updated driver and further troubleshooting."

The event ID is 1043
the event source is SMBServer

My question is this: Is this message expected or is this indicating that there is a problem with my ROCE configuration somehow? Has anyone else had experience with this in similar situations or configurations?

Any guidance here would be appreciated.

Hello @kmcdevitt,

Thank you for posting your query on our community. The “RDMA connection disconnected” message indicates an unstable RDMA connection.

Please note that the minimum supported ConnectX-5 firmware version with WinOF-2 24.7 drivers (currently installed on your servers) is 16.35.4030 // 16.35.3502.

You mentioned that FW version running on your cards is 16.35.3006. Please ensure to upgrade the FW to a supported version.

In additon, you can refer to the below article for RoCE configuration examples:
https://enterprise-support.nvidia.com/s/article/recommended-network-configuration-examples-for-roce-deployment#jive_content_id_Recommended_Configurations

Thanks,
Bhargavi