Issues with setting up Storage Spaces Direct

Hello everyone,

I am working on setting up a S2D Cluster and have ran into an issue where I am unable to get my nodes to communicate via RDMA I have used the Test-RDMA.ps1 script and DISKSPD provided in another post in the Mellanox Community.

Here is my hardware configuration:

Configuration 4 Nodes with Following configurations on each

Hardware: Intel R2224WTTYSR Server Systems

256GB Samsung DDR4 LRDIMMs

2x Intel E5-2620 v4 Xeon CPU

1x Mellanox ConnectX4 - MCX414A-BCAT

1x Broadcom LSI 3805-24i HBA

2x Intel DC P3700 800GB for Journal\cache drives

4x Seagate 2TB SAS HDs for Capacity drives

Networking 1x Netgear 10GbE network switch for VMs

2x Mellanox SX1012 12 Port QSFP28 Switch for RDMA\cluster Traffic

8x MC2210128-003 Mellanox LinkX Cables

We are not utilizing SET Teams and are only using the ConnectX4 NICs for RoCE traffic for the storage traffic.

All nodes are setup with this configuration for the RDMA enabled NICs:

Attached is the configuration of my Mellanox 1012X Switch.

Any help in the right direction is very appreciated.

Thanks!

TOR1 - Config.txt.zip (903 Bytes)

Before the tests that involve I/O system, did you verify that

  • TCP/IP connectivity works ( ping?)

  • nd_write_bw/nd_read_bw tests are working?

  • are you able to run nd_XXXX tests on the same machine? Use two shell windows to run sever in one and client in another?

  • what is the failure you are getting when RDMA test fails?

We are using S2D with IB on ConnectX 3 cards. No problems.

I am not familiar with the Test-RDMA script.

How are you testing RDMA? I use Windows Performance Monitor. There are RDMA specific counters which make it easy to track RDMA traffic. Other than setup and drivers there was not much else to do.

Have you installed the latest drivers for you cards?