I am trying to configure NFS for our infiniband network, and following the instructions at HowTo Configure NFS over RDMA (RoCE) https://community.mellanox.com/s/article/howto-configure-nfs-over-rdma--roce-x
I installed the MLNX_OFED drivers on CentOS 6.8. (I had originally configured the network and IPoIB interface using the RHEL manual (Part II. InfiniBand and RDMA Networking Part III. InfiniBand and RDMA Networking Red Hat Enterprise Linux 7 | Red Hat Customer Portal ) and was using NFS over the IPoIB but was receiving a bunch of page allocation failures)
I used the mlnxofedinstall script which completed successfully and updated the firmware, e.g.:
…
Device (84:00.0):
84:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Link Width: x8
PCI Link Speed: 8GT/s
Installation finished successfully.
Preparing… ########################################### [100%]
1:mlnx-fw-updater ########################################### [100%]
Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf
Attempting to perform Firmware update…
Querying Mellanox devices firmware …
Device #1:
Device Type: ConnectX3
Part Number: MCX354A-FCB_A2-A5
Description: ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6
PSID: MT_1090120019
PCI Device Name: 84:00.0
Port1 GUID: e41d2d03006f89f1
Port2 GUID: e41d2d03006f89f2
Versions: Current Available
FW 2.32.5100 2.36.5150
PXE 3.4.0306 3.4.0740
Status: Update required
Found 1 device(s) requiring firmware update…
Device #1: Updating FW … Done
Restart needed for updates to take effect.
Log File: /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0.17971.logs/fw_update.log
Please reboot your system for the changes to take effect.
To load the new driver, run:
/etc/init.d/openibd restart
I rebooted the system and then ran the self test:
hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected … 1
PCI Device Check … PASS
Kernel Arch … x86_64
Host Driver Version … MLNX_OFED_LINUX-3.4-1.0.0.0 (OFED-3.4-1.0.0): 2.6.32-642.el6.x86_64
Host Driver RPM Check … PASS
Firmware on CA #0 VPI … v2.36.5150
Host Driver Initialization … PASS
Number of CA Ports Active … 0
Port State of Port #1 on CA #0 (VPI)… INIT (InfiniBand)
Port State of Port #2 on CA #0 (VPI)… DOWN (InfiniBand)
Error Counter Check on CA #0 (VPI)… FAIL
REASON: found errors in the following counters
Errors in /sys/class/infiniband/mlx4_0/ports/1/counters
port_rcv_errors: 93
Kernel Syslog Check … PASS
Node GUID on CA #0 (VPI) … e4:1d:2d:03:00:6f:89:f0
------------------ DONE ---------------------
As you can see there is an error with the port_rcv_errors counter. Also the port state for Port #1 will remain at INIT until i start the subnet manager (/etc/init.d/opensmd start) since we have unmanaged switch. That used to start automatically. So maybe the OFED installation wasn’t completely successful?
Additionally, i am unable to configure NFS for RDMA. e.g.:
echo rdma 20049 > /proc/fs/nfsd/portlist
-bash: echo: write error: Protocol not supported