NFS/RDMA works with inbox driver, fails with OFED (ConnectX-3, Ubuntu 16.04)

I am testing Mellanox hardware and software with Ubuntu 16.04, using two machines with ConnectX-3 adapters, connected directly with a cable, without a switch. I have tested all combinations of InfiniBand and Ethernet modes, inbox and OFED drivers, and three kernel versions, and discovered that I am unable to use NFS over RDMA with the OFED driver, while it works fine with the inbox driver (as included with Ubuntu 16.04). The problem occurs in both IB and Ethernet mode, but, strangely, OFED driver fails in three distinct ways depending on which kernel is in use. Both machines exhibit the same issue, so this is not a hardware defect. Non-NFS RDMA tests such as ib_write_bw work fine with both drivers and all kernels.

The question is: Am I doing something wrong or is this a bug (or, perhaps, three bugs)? I followed the tutorials available online, and I got everything working except for the problem described here. The failures have an unhealthy look and feel (see detailed description below), so I am leaning toward the bugs. If so, where to report them? Mellanox, openfabrics.org, linux-rdma@GitHub, LKML, somewhere else? Number of parts involved, and variety of distinct symptoms makes this issue somewhat of a conundrum.

Hardware - two identical machines

CPU: Intel Core i5-6600 3.30GHz

RAM: 64GB DDR4 non-ECC unbuffered

Motherboard: Gigabyte Z170X-UD5-CF

IB Adapter: Mellanox ConnectX-3 MCX353A-FCBT (Firmware version: 2.42.5000)

Software

OS: Ubuntu 16.04.3 LTS

Kernels tested:

4.4.0-101-generic (standard Ubuntu 16.04 kernel)

4.10.0-40-generic (HWE kernel)

4.13.0-17-generic (HWE-edge kernel)

OFED: MLNX_OFED_LINUX-4.2-1.0.0.0

Source: http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.2-1.0.0.0/MLNX_OFED_LINUX-4.2-1.0.0.0-ubuntu16.04-x86_64.tgz

mlnx-nfsrdma:

mlnx_ofed/mlnx-ofa_kernel-4.0.git mlnx_ofed_4_2

commit f36c8704a9cd969fd5b3ecdf142c6f2ffde495f8

Test environment

The following modules are loaded (if not already loaded):

modprobe mlx4_ib; modprobe ib_umad; modprobe ib_cm; modprobe ib_ucm; modprobe rdma_ucm

In IB mode, opensm is started on one of the machines, and module ib_ipoib is loaded on both machines.

Networking is then configured (IP 10.2.0.1 and 10.2.0.2) and verified to work (ping, iperf, qperf, ib_write_bw, etc). e.g.:

ifconfig ib0 10.2.0.1 netmask 255.255.255.0

(interface name in Ethernet mode is different)

On the machine acting as NFS server, a ramdisk is created and exported as a share, and module svcrdma is loaded:

mount -t tmpfs -o size=60G tmpfs /mnt/ramdisk/

exportfs -o rw,fsid=1,async,no_subtree_check 10.2.0.0/24:/mnt/ramdisk/

modprobe svcrdma

echo rdma 20049 > /proc/fs/nfsd/portlist

continued…

On the machine that will be the NFS client, module xprtrdma is loaded, and we attempt to mount the share:

modprobe xprtrdma

mount -o rdma,port=20049 10.2.0.1:/mnt/ramdisk /mnt/remote

This fails if client runs the OFED driver, in three different ways, depending on which kernel is in use.

Note: the non-RDMA NFS mount works fine.

Kernel 4.13.0-17-generic

Mount command displays an error:

mount.nfs: Cannot allocate memory

Syslog contains an entry:

kernel: [57284.088130] rpcrdma: ‘frwr’ mode is not supported by device mlx4_0

Kernel 4.10.0-40-generic

Mount commands does not display any error, and finishes as if successful, but share is not mounted. Syslog contents is included below. An additional symptom is that after this mount command, system is left in a state which prevents it from shutting down gracefully, and must be power-cycled forcibly.

[ 86.339938] FS-Cache: Loaded

[ 86.350372] FS-Cache: Netfs ‘nfs’ registered for caching

[ 86.370129] NFS: Registering the id_resolver key type

[ 86.370133] Key type id_resolver registered

[ 86.370134] Key type id_legacy registered

[ 86.370299] BUG: unable to handle kernel paging request at ffffffffc09fcb30

[ 86.370323] IP: try_module_get+0x3a/0xe0

[ 86.370332] PGD 3cd20c067

[ 86.370333] PUD 3cd20e067

[ 86.370339] PMD fb5cbb067

[ 86.370346] PTE f9ddef161

[ 86.370364] Oops: 0003 [#1] SMP

[ 86.370372] Modules linked in: nfsv4 nfs fscache rpcrdma bnep rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_core(OE) zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic aes_x86_64 crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core input_leds intel_cstate intel_rapl_perf snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore mei_me intel_pch_thermal shpchp mei mac_hid intel_lpss_acpi intel_lpss

[ 86.370514] acpi_als tpm_infineon acpi_pad hci_uart btbcm btqca btintel bluetooth kfifo_buf industrialio nfsd auth_rpcgss nfs_acl lockd grace knem(OE) coretemp parport_pc ppdev sunrpc lp parport autofs4 mlx4_en(OE) hid_generic usbhid uas usb_storage mxm_wmi i915 mlx4_core(OE) devlink e1000e drm_kms_helper ixgbe syscopyarea igb sysfillrect sysimgblt ahci fb_sys_fops dca libahci i2c_algo_bit drm ptp pps_core mdio mlx_compat(OE) wmi video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid fjes

[ 86.370618] CPU: 1 PID: 2896 Comm: mount.nfs Tainted: P OE 4.10.0-40-generic #44~16.04.1-Ubuntu

[ 86.371539] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016

[ 86.372317] task: ffff88c1e862c500 task.stack: ffffb67488298000

[ 86.373192] RIP: 0010:try_module_get+0x3a/0xe0

[ 86.374075] RSP: 0018:ffffb6748829b740 EFLAGS: 00010202

[ 86.374933] RAX: 000000005d5e415d RBX: 0000000000000000 RCX: 000000005d5e415d

[ 86.375802] RDX: 000000005d5e415e RSI: ffffffffc09fcb30 RDI: ffffffffc09fc820

[ 86.376654] RBP: ffffb6748829b758 R08: ffff88c27fc10248 R09: 0000000000000000

[ 86.377482] R10: 0000000000000011 R11: 0000000000000000 R12: ffff88c20d3965e0

[ 86.378264] R13: ffff88c20d3965c0 R14: ffff88c21dc77c00 R15: ffffb6748829b7c8

[ 86.379060] FS: 00007f2d69152880(0000) GS:ffff88c27fc80000(0000) knlGS:0000000000000000

[ 86.379857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 86.380659] CR2: ffffffffc09fcb30 CR3: 0000000f9e4f5000 CR4: 00000000003406e0

[ 86.381522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[ 86.382315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

continued…

[ 86.383109] Call Trace:

[ 86.383896] rpcrdma_create_id+0xdc/0x250 [rpcrdma]

[ 86.384691] rpcrdma_ia_open+0x26/0x110 [rpcrdma]

[ 86.385526] xprt_setup_rdma.part.11+0x151/0x400 [rpcrdma]

[ 86.386326] ? check_preempt_curr+0x54/0x90

[ 86.387127] ? ttwu_do_wakeup+0x19/0xe0

[ 86.387919] ? ttwu_do_activate+0x6f/0x80

[ 86.388706] xprt_setup_rdma+0x2f/0x50 [rpcrdma]

[ 86.389535] xprt_create_transport+0x85/0x220 [sunrpc]

[ 86.390326] rpc_create+0xe4/0x1e0 [sunrpc]

[ 86.391115] ? ktime_get+0x3c/0xb0

[ 86.391902] nfs_create_rpc_client+0x107/0x150 [nfs]

[ 86.392695] nfs4_init_client+0xa0/0x2b0 [nfsv4]

[ 86.393522] ? idr_alloc+0x62/0x150

[ 86.394302] ? __rpc_init_priority_wait_queue+0x76/0xb0 [sunrpc]

[ 86.395089] ? rpc_init_wait_queue+0x13/0x20 [sunrpc]

[ 86.395874] nfs_get_client+0x2e0/0x390 [nfs]

[ 86.396664] nfs4_set_client+0x93/0x120 [nfsv4]

[ 86.397512] nfs4_create_server+0x135/0x380 [nfsv4]

[ 86.398305] ? find_next_bit+0x15/0x20

[ 86.399092] nfs4_remote_mount+0x2e/0x60 [nfsv4]

[ 86.399874] mount_fs+0x38/0x160

[ 86.400650] ? __alloc_percpu+0x15/0x20

[ 86.401488] vfs_kern_mount+0x67/0x110

[ 86.402273] nfs_do_root_mount+0x84/0xc0 [nfsv4]

[ 86.403053] nfs4_try_mount+0x44/0xd0 [nfsv4]

[ 86.403843] ? get_nfs_version+0x27/0x90 [nfs]

[ 86.404633] nfs_fs_mount+0x728/0xda0 [nfs]

[ 86.405466] ? find_next_bit+0x15/0x20

[ 86.406253] ? nfs_clone_super+0x130/0x130 [nfs]

[ 86.407036] ? param_set_portnr+0x70/0x70 [nfs]

[ 86.407816] mount_fs+0x38/0x160

[ 86.408594] ? __alloc_percpu+0x15/0x20

[ 86.409421] vfs_kern_mount+0x67/0x110

[ 86.410234] do_mount+0x1e9/0xd20

[ 86.411005] SyS_mount+0x95/0xe0

[ 86.411778] entry_SYSCALL_64_fastpath+0x1e/0xad

[ 86.412551] RIP: 0033:0x7f2d6881db5a

[ 86.413364] RSP: 002b:00007fff9a37e258 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5

[ 86.414143] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d6881db5a

[ 86.414920] RDX: 0000000001f0f250 RSI: 0000000001f0f230 RDI: 0000000001f0f270

[ 86.415704] RBP: 0000000001f11100 R08: 0000000001f113f0 R09: 0000000001f113f0

[ 86.416489] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff9a37e2b0

[ 86.417319] R13: 00007fff9a37e2a4 R14: 0000000000000000 R15: 0000000001f0f010

[ 86.418101] Code: 54 53 0f 84 af 00 00 00 83 3f 02 0f 84 9b 00 00 00 8b 8f 10 03 00 00 85 c9 0f 84 8d 00 00 00 8d 51 01 48 8d b7 10 03 00 00 89 c8 0f b1 97 10 03 00 00 39 c8 89 c2 75 60 4c 8b 6d 08 0f 1f 44

[ 86.418932] RIP: try_module_get+0x3a/0xe0 RSP: ffffb6748829b740

[ 86.419748] CR2: ffffffffc09fcb30

[ 86.425682] —[ end trace 7a8374258a719fb7 ]—

continued…

Kernel 4.4.0-101-generic

Symptoms are similar to those reported in this thread: NFS over RoCE Ubuntu 16.04 with latest OFED

Command modprobe xprtrdma displays an error:

stderr: modprobe: ERROR: could not insert ‘rpcrdma’: Invalid argument

…and logs in syslog:

[ 71.704632] rpcrdma: disagrees about version of symbol ib_create_cq

[ 71.704635] rpcrdma: Unknown symbol ib_create_cq (err -22)

[ 71.704639] rpcrdma: disagrees about version of symbol rdma_resolve_addr

[ 71.704639] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)

[ 71.704663] rpcrdma: disagrees about version of symbol ib_event_msg

[ 71.704664] rpcrdma: Unknown symbol ib_event_msg (err -22)

[ 71.704671] rpcrdma: disagrees about version of symbol ib_dereg_mr

[ 71.704672] rpcrdma: Unknown symbol ib_dereg_mr (err -22)

[ 71.704674] rpcrdma: disagrees about version of symbol ib_query_qp

[ 71.704675] rpcrdma: Unknown symbol ib_query_qp (err -22)

[ 71.704679] rpcrdma: disagrees about version of symbol rdma_disconnect

[ 71.704679] rpcrdma: Unknown symbol rdma_disconnect (err -22)

[ 71.704681] rpcrdma: disagrees about version of symbol ib_alloc_fmr

[ 71.704682] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)

[ 71.704695] rpcrdma: disagrees about version of symbol ib_dealloc_fmr

[ 71.704696] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)

[ 71.704697] rpcrdma: disagrees about version of symbol rdma_resolve_route

[ 71.704698] rpcrdma: Unknown symbol rdma_resolve_route (err -22)

[ 71.704706] rpcrdma: disagrees about version of symbol rdma_bind_addr

[ 71.704707] rpcrdma: Unknown symbol rdma_bind_addr (err -22)

[ 71.704713] rpcrdma: disagrees about version of symbol rdma_create_qp

[ 71.704714] rpcrdma: Unknown symbol rdma_create_qp (err -22)

[ 71.704716] rpcrdma: disagrees about version of symbol ib_map_mr_sg

[ 71.704716] rpcrdma: Unknown symbol ib_map_mr_sg (err -22)

[ 71.704718] rpcrdma: disagrees about version of symbol ib_destroy_cq

[ 71.704719] rpcrdma: Unknown symbol ib_destroy_cq (err -22)

[ 71.704720] rpcrdma: disagrees about version of symbol rdma_create_id

[ 71.704721] rpcrdma: Unknown symbol rdma_create_id (err -22)

[ 71.704742] rpcrdma: disagrees about version of symbol rdma_listen

[ 71.704743] rpcrdma: Unknown symbol rdma_listen (err -22)

[ 71.704744] rpcrdma: disagrees about version of symbol rdma_destroy_qp

[ 71.704745] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)

[ 71.704760] rpcrdma: Unknown symbol ib_query_device (err 0)

[ 71.704762] rpcrdma: disagrees about version of symbol ib_get_dma_mr

[ 71.704763] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)

[ 71.704773] rpcrdma: Unknown symbol ib_alloc_pd (err 0)

[ 71.704781] rpcrdma: disagrees about version of symbol ib_alloc_mr

[ 71.704782] rpcrdma: Unknown symbol ib_alloc_mr (err -22)

[ 71.704797] rpcrdma: disagrees about version of symbol rdma_connect

[ 71.704798] rpcrdma: Unknown symbol rdma_connect (err -22)

[ 71.704816] rpcrdma: disagrees about version of symbol rdma_destroy_id

[ 71.704816] rpcrdma: Unknown symbol rdma_destroy_id (err -22)

[ 71.704823] rpcrdma: disagrees about version of symbol rdma_accept

[ 71.704824] rpcrdma: Unknown symbol rdma_accept (err -22)

[ 71.704826] rpcrdma: disagrees about version of symbol ib_destroy_qp

[ 71.704826] rpcrdma: Unknown symbol ib_destroy_qp (err -22)

[ 71.704844] rpcrdma: disagrees about version of symbol ib_dealloc_pd

[ 71.704845] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

continued…

Mount displays an error:

mount.nfs: mount system call failed

…and logs in syslog:

[ 87.631109] FS-Cache: Loaded

[ 87.646332] FS-Cache: Netfs ‘nfs’ registered for caching

[ 87.648897] rpcrdma: disagrees about version of symbol ib_create_cq

[ 87.648899] rpcrdma: Unknown symbol ib_create_cq (err -22)

[ 87.648903] rpcrdma: disagrees about version of symbol rdma_resolve_addr

[ 87.648903] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)

[ 87.648928] rpcrdma: disagrees about version of symbol ib_event_msg

[ 87.648929] rpcrdma: Unknown symbol ib_event_msg (err -22)

[ 87.648936] rpcrdma: disagrees about version of symbol ib_dereg_mr

[ 87.648937] rpcrdma: Unknown symbol ib_dereg_mr (err -22)

[ 87.648940] rpcrdma: disagrees about version of symbol ib_query_qp

[ 87.648941] rpcrdma: Unknown symbol ib_query_qp (err -22)

[ 87.648944] rpcrdma: disagrees about version of symbol rdma_disconnect

[ 87.648945] rpcrdma: Unknown symbol rdma_disconnect (err -22)

[ 87.648947] rpcrdma: disagrees about version of symbol ib_alloc_fmr

[ 87.648948] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)

[ 87.648962] rpcrdma: disagrees about version of symbol ib_dealloc_fmr

[ 87.648963] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)

[ 87.648965] rpcrdma: disagrees about version of symbol rdma_resolve_route

[ 87.648965] rpcrdma: Unknown symbol rdma_resolve_route (err -22)

[ 87.648974] rpcrdma: disagrees about version of symbol rdma_bind_addr

[ 87.648975] rpcrdma: Unknown symbol rdma_bind_addr (err -22)

[ 87.648982] rpcrdma: disagrees about version of symbol rdma_create_qp

[ 87.648982] rpcrdma: Unknown symbol rdma_create_qp (err -22)

[ 87.648984] rpcrdma: disagrees about version of symbol ib_map_mr_sg

[ 87.648985] rpcrdma: Unknown symbol ib_map_mr_sg (err -22)

[ 87.648987] rpcrdma: disagrees about version of symbol ib_destroy_cq

[ 87.648988] rpcrdma: Unknown symbol ib_destroy_cq (err -22)

[ 87.648989] rpcrdma: disagrees about version of symbol rdma_create_id

[ 87.648990] rpcrdma: Unknown symbol rdma_create_id (err -22)

[ 87.649012] rpcrdma: disagrees about version of symbol rdma_listen

[ 87.649013] rpcrdma: Unknown symbol rdma_listen (err -22)

[ 87.649014] rpcrdma: disagrees about version of symbol rdma_destroy_qp

[ 87.649015] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)

[ 87.649031] rpcrdma: Unknown symbol ib_query_device (err 0)

[ 87.649034] rpcrdma: disagrees about version of symbol ib_get_dma_mr

[ 87.649034] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)

[ 87.649045] rpcrdma: Unknown symbol ib_alloc_pd (err 0)

[ 87.649054] rpcrdma: disagrees about version of symbol ib_alloc_mr

[ 87.649055] rpcrdma: Unknown symbol ib_alloc_mr (err -22)

[ 87.649072] rpcrdma: disagrees about version of symbol rdma_connect

[ 87.649072] rpcrdma: Unknown symbol rdma_connect (err -22)

[ 87.649093] rpcrdma: disagrees about version of symbol rdma_destroy_id

[ 87.649093] rpcrdma: Unknown symbol rdma_destroy_id (err -22)

[ 87.649100] rpcrdma: disagrees about version of symbol rdma_accept

[ 87.649101] rpcrdma: Unknown symbol rdma_accept (err -22)

[ 87.649104] rpcrdma: disagrees about version of symbol ib_destroy_qp

[ 87.649104] rpcrdma: Unknown symbol ib_destroy_qp (err -22)

[ 87.649124] rpcrdma: disagrees about version of symbol ib_dealloc_pd

[ 87.649125] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

[ 87.678511] NFS: Registering the id_resolver key type

[ 87.678516] Key type id_resolver registered

[ 87.678517] Key type id_legacy registered

Thank you for yor time and advice.

reply…

Thank you very much for your reply. Could you (or anyone in the know) elaborate on this issue? In particular, I have the following questions:

  1. What is the rationale of removing the support for NFS over RDMA from OFED, or was it removed without a reason? I had to dig through release notes of several 4.0-w.x.y.z MLNX_OFED releases before I found a very laconic remark on the issue (4.0-2.0.0.1, page 14), and apart from that I was not able to find any other information whatsoever. On the other hand, the documentation about support of NFS over RDMA in OFED is ample and can be found easily.

  2. Command ofed_info from the release that this thread is about (4.2) lists mlnx_nfsrdma and provides its version number. Is ofed_info output incorrect, or does mlnx_nfsrdma provide functionality unrelated to NFS and RDMA? (Forgive me, I am not trying to be sarcastic, nor do I wish to contradict you; I am simply confused.)

  3. The nature of the problems that I described in the first post does not suggest a missing module, but rather genuine bugs, particularly with kernel 4.10, where we get a crash (kernel oops). I imagine this can be caused by some incompatibilities between Linux kernels and the OFED driver. Would anyone be able to explain this in more detail?

  4. My tests show that RDMA does make a difference for NFS. In some cases I see a significantly higher throughput, and also greater stability with RDMA. So, if NFS RDMA support has been removed from OFED, is it possible to restore it by installing an additional package, or perhaps by compiling it from source?

  5. Looking at the longer time frame (say, 5 years), what are the prospects that NFS over RDMA support will stay in future releases of inbox drivers? Is MLNX_OFED an upstream release for inbox, or is inbox developed independently? Not knowing the reason of NFS over RDMA removal, I can only make wild guesses about the future.

I would like to add that using ConnectX-3 with Ubuntu 16.04 is not my end goal. I have been tasked with a design of a compute cluster, and in order to make informed decisions, I obtained second-hand Mellanox hardware to test the technology. The purpose of my current tests, among other things, is to decide whether to use InfiniBand or Ethernet, and whether to go with Mellanox or another vendor. NFS is one of the reasons we want fast interconnect (MPI is another). That’s why the news of NFS RDMA support having been gone with the wind, for reasons unspecified, with just a small mention buried deep in one of the many release note files, strikes me as quite unexpected and very problematic. May I kindly ask again for more information?

Thank you very much.

reply…

Having not received any satisfactory answer, I dug into the matter deeper and discovered that both MLNX_OFED release notes and alkx’s answer are inaccurate on this issue. That is to say, I found a working solution to this problem, that I describe here: How to use NFS over RDMA with MLNX_OFED [solution]

Hello,

I would also like to chime in that we are also interested in NFS over RDMA (over 100Gbit IB). One of the reasons of selecting Mellanox on our most recent IB cluster purchase was the availability of NFS over RDMA. We were extremely disappointed with the corporate decision of Mellanox to remove “official” support of NFS over RDMA from the Mellanox-distributed OFED package … and hide this significant restriction from all top-level marketing and product description material.

Mellanox has no problem evangelizing their additional capabilities, such as “extended IP over IB” mode which enables traditional TCP Ethernet hardware offloads for IP over IB packet processing … but then obfuscates and hides the fact that NFS over RDMA was removed … and that “extended IP over IB” mode is incompatible with Connected mode operation and the 65k MTU size.

Mellanox … please listen. From a competitive standpoint … limiting your customers to only using standard NFS4 over TCP using IP over IB (even with “extended mode”) yields sub-par performance from our testing using 100Gbit IB and very fast 4.7 GHz servers and clients.

Our next proof-of-concept exercise will be testing a “supported” NFS over RDMA solution from a non-Mellanox vendor using 100Gbit IB or faster. From our sub-par results using non-RDMA NFS on Mellanox OFED, we expect about 2x better throughput using NFS over RDMA. We have reached about 4,500 MB/sec using normal NFS4 over TCP (with IP over IB) … but on a 100Gbit IB link, the expectations are higher.

I agree with Mr Puzio that Mellanox’s decision to remove NFS over RDMA support from OFED is a poor decision. Even Microsoft is now supporting SMB over RDMA. RDMA over Ethernet opens the whole RDMA-centric efficiency advantages to Ethernet. Mellanox non-support for NFS over RDMA is a significant step backwards.

This lack of NFS over RDMA support will likely disqualify Mellanox from competing for future IB-based clusters.

Regards,

Dave B

As mentioned in Release Notes, NFS over RDMA has been removed from Mellanox OFED version 4.0 and newer. You might try to use previous 3.4 version of Mellanox OFED, that should work pretty well with ConnectX-3 or continue to use inbox version of the driver

Having not received any satisfactory answer, I dug into the matter deeper and discovered that both MLNX_OFED release notes and alkx’s answer are inaccurate on this issue. That is to say, I found a working solution to this problem, that I describe here: How to use NFS over RDMA with MLNX_OFED [solution]

Thank you very much for your reply. Could you (or anyone in the know) elaborate on this issue? In particular, I have the following questions:

  1. What is the rationale of removing the support for NFS over RDMA from OFED, or was it removed without a reason? I had to dig through release notes of several 4.0-w.x.y.z MLNX_OFED releases before I found a very laconic remark on the issue (4.0-2.0.0.1, page 14), and apart from that I was not able to find any other information whatsoever. On the other hand, the documentation about support of NFS over RDMA in OFED is ample and can be found easily.

  2. Command ofed_info from the release that this thread is about (4.2) lists mlnx_nfsrdma and provides its version number. Is ofed_info output incorrect, or does mlnx_nfsrdma provide functionality unrelated to NFS and RDMA? (Forgive me, I am not trying to be sarcastic, nor do I wish to contradict you; I am simply confused.)

  3. The nature of the problems that I described in the first post does not suggest a missing module, but rather genuine bugs, particularly with kernel 4.10, where we get a crash (kernel oops). I imagine this can be caused by some incompatibilities between Linux kernels and the OFED driver. Would anyone be able to explain this in more detail?

  4. My tests show that RDMA does make a difference for NFS. In some cases I see a significantly higher throughput, and also greater stability with RDMA. So, if NFS RDMA support has been removed from OFED, is it possible to restore it by installing an additional package, or perhaps by compiling it from source?

  5. Looking at the longer time frame (say, 5 years), what are the prospects that NFS over RDMA support will stay in future releases of inbox drivers? Is MLNX_OFED an upstream release for inbox, or is inbox developed independently? Not knowing the reason of NFS over RDMA removal, I can only make wild guesses about the future.

I would like to add that using ConnectX-3 with Ubuntu 16.04 is not my end goal. I have been tasked with a design of a compute cluster, and in order to make informed decisions, I obtained second-hand Mellanox hardware to test the technology. The purpose of my current tests, among other things, is to decide whether to use InfiniBand or Ethernet, and whether to go with Mellanox or another vendor. NFS is one of the reasons we want fast interconnect (MPI is another). That’s why the news of NFS RDMA support having been gone with the wind, for reasons unspecified, with just a small mention buried deep in one of the many release note files, strikes me as quite unexpected and very problematic. May I kindly ask again for more information?

Thank you very much.

I just noticed that links to my post with a solution disappeared during the forum platform migration (at least I can’t see them). What’s worse, if you google it, you get a link to the old forum that no longer works.

So here is the current link:

How to use NFS over RDMA with MLNX_OFED [solution]