I’m running a S2D cluster with three Dell PowerEdge R740xd servers. To connect the servers together I’m using two Dell S4048-ON switches. On the Windows Server I installed the WinOF driver and configured DCB with PFC for the lossless network.
The performance of my disk array is poor, and in the SMBServer logs I see the following being logged:
"RDMA connection disconnected.
Transport name: \Device\RdmaSmbIpv4_10.31.1.4
Milliseconds spent closing the connection: 0
Guidance:
Closing an RDMA connection should not take longer than 2 minutes. An RDMA IO that takes an abnormally long time to complete indicates a problem with the RDMA network adapters on this computer or its remote host. Contact your RDMA vendor for an updated driver and further troubleshooting."
And here is the output of vstat:
"hca_idx=0
uplink={BUS=PCI_E Gen3, SPEED=8.0 Gbps, WIDTH=x8, CAPS=8.0*x8}
MSI-X={ENABLED=1, SUPPORTED=128, GRANTED=24, ALL_MASKED=N}
vendor_id=0x02c9
vendor_part_id=4103
hw_ver=0x0
fw_ver=2.42.5000
PSID=MT_1090111023
node_guid=248a:0703:00bb:4210
num_phys_ports=2
port=1
port_guid=268a:07ff:febb:4210
port_state=PORT_ACTIVE (4)
link_speed=NA
link_width=NA
rate=40.00 Gbps
port_phys_state=LINK_UP (5)
active_speed=40.00 Gbps
sm_lid=0x0000
port_lid=0x0000
port_lmc=0x0
transport=RoCE v2.0
rroce_udp_port=0x12b7
max_mtu=2048 (4)
active_mtu=2048 (4)
GID[0]=0000:0000:0000:0000:0000:ffff:0a1f:0104
GID[1]=fe80:0000:0000:0000:3048:ef64:8d42:fbc3
port=2
port_guid=268a:07ff:febb:4211
port_state=PORT_ACTIVE (4)
link_speed=NA
link_width=NA
rate=40.00 Gbps
port_phys_state=LINK_UP (5)
active_speed=40.00 Gbps
sm_lid=0x0000
port_lid=0x0000
port_lmc=0x0
transport=RoCE v2.0
rroce_udp_port=0x12b7
max_mtu=2048 (4)
active_mtu=2048 (4)
GID[0]=0000:0000:0000:0000:0000:ffff:0a1f:0204
GID[1]=fe80:0000:0000:0000:85ba:adfd:5483:2300"