I have a cluster where all of the nodes (head and slave nodes) are using ConnectX-4 dual port 4X EDR IB (MCX456A-ECAT) connected to an externally managed MSB-7890 36-port 4X EDR IB switch.
All of the nodes are also running CentOS 7.6.1810 with the software group ‘Infiniband Support’ installed (because this one still supports NFSoRDMA).
On the head node, I have four Samsung 860 EVO 1 TB SATA 6 Gbps SSDs in RAID0 through the Marvell 9230 controller on an Asus P9X79-E WS motherboard.
Testing on the headnode itself shows that I can get around 21.9 Gbps total throughput when running:
$ time -p dd if=/dev/zero of=10Gfile bs=1024k count=10240
But when I trying to do the same thing over IB, I can only get about 8.5 Gbps at best.
NFSoRDMA is configured properly.
Here is /etc/exports:
Here is /etc/rdma/rdma.conf:
Here is /etc/fstab on the slave nodes:
aes0:/home/cluster /home/cluster nfs defaults,rdma,port=20049 0 0
And here is confirmation that the NFS share is mounted using RDMA:
aes0:/home/cluster on /home/cluster type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=rdma,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=xxxxxxx,local_lock=none,addr=xxxxxxx)
The RAID volume is mounted like this:
/dev/sdb1 on /home/cluster type xfs (rw,relatime,attr2,inode64,noquota)
I don’t really understand why the NFSoRDMA mount appears to be capped at those less than 10 Gbps speeds.
Your help is greatly appreciated.