Documentation for using Linux, NFS, and RDMA?

I’ve read the Linux Kernel IPoIB and NFS-RDMA documentation. It doesn’t seem particularly current with the 2.6 kernels mentioned.

I’m looking for documentations for doing NFS-RDMA with the linux kernel 3.13 (which is used by Ubuntu 14.04 LTS) and mellanox FDR cards and switch.

I’ve got RDMA and IPoIB working. So ping, ibv_srq_pingping, udaddy, rdma_server, ib_send_bw, rping, and ucmatose all show client <-> server connectivity.

Even the mount works:

root@c2-31:/# mount -o rdma,port=20049 10.18.3.34:/export/1 /mnt/ib

root@c2-31:/#

I can cd around, but whenever I try to actually read a file I get:

root@c2-31:/mnt/ib/bill# ls -alh testfile

-rw-r–r-- 1 root root 1.0G Feb 20 19:31 testfile

root@c2-31:/mnt/ib/bill# cat testfile

cat: testfile: Input/output error

On both client and server I do have:

ulimit -l

unlimited

tail -4 /etc/security/limits.conf

  • soft memlock unlimited

  • hard memlock unlimited

root soft memlock unlimited

root hard memlock unlimited

My goal for this is to get it working and write a puppet module to manage the client and server setup for NFS RDMA.

Try running with NFSv3.

If there are still problems, try running “rpcdebug -m rpc -s all” on client and server and post dmesg outputs from both.

Message from @Bill Broadley​

RDMA bandwidth testing works:

server# ib_send_bw -d mlx4_0 -i 1 -F -b


Send Bidirectional BW Test

Connection type : RC

Inline data is used up to 400 bytes message

local address: LID 0x02, QPN 0x01c6, PSN 0xef4fed

remote address: LID 0x04, QPN 0x004c, PSN 0x98d931

Mtu : 2048


#bytes #iterations BW peak[MB/sec] BW average[MB/sec]


client:

ib_send_bw -d mlx4_0 -i 1 -F 10.18.3.31 -b


Send Bidirectional BW Test

Connection type : RC

Inline data is used up to 400 bytes message

local address: LID 0x04, QPN 0x004c, PSN 0x98d931

remote address: LID 0x02, QPN 0x01c6, PSN 0xef4fed

Mtu : 2048


#bytes #iterations BW peak[MB/sec] BW average[MB/sec]

Conflicting CPU frequency values detected: 1200.000000 != 2101.000000

Test integrity may be harmed !

Conflicting CPU frequency values detected: 1200.000000 != 2101.000000

Test integrity may be harmed !

Warning: measured timestamp frequency 2099.8 differs from nominal 1200 MHz

65536 1000 11457.31 11456.84


root@nas-2-1:~#

Does the lack of results from the server side look like a problem?

On the client I did:

mount -o rdma,port=20049 10.18.3.58:/export/1 /mnt/ib

I can cd around, and cat/touch small files:

cat .gnuplot-wxt

raise=1

persist=0

ctrl=0

rendering=2

hinting=100

But not anything large:

root@c2-31:/mnt/ib# md5sum testfile

md5sum: testfile: Input/output error

message from @Bill Broadley​ (continue)

I’ve not tried NFSv3 yet, does NFSv4 have a known problem with RDMA? The same pair of machines has NFSv4 working over GigE. I tried your command and:

dmesg -c

cat testfile

cat: testfile: Input/output error

rpcdebug -m rpc -s all

dmesg

[242031.688205] RPC: looking up Generic cred

[242031.688213] RPC: looking up Generic cred

[242031.688216] RPC: looking up Generic cred

[242031.688222] RPC: new task initialized, procpid 12386

[242031.688224] RPC: allocated task ffff880845c33000

[242031.688236] RPC: 110 __rpc_execute flags=0x81

[242031.688241] RPC: 110 call_start nfs4 proc OPEN (async)

[242031.688243] RPC: 110 call_reserve (status 0)

[242031.688246] RPC: 110 reserved req ffff880853255600 xid dfed959b

[242031.688249] RPC: 110 call_reserveresult (status 0)

[242031.688251] RPC: 110 call_refresh (status 0)

[242031.688253] RPC: 110 refreshing UNIX cred ffff881050ef20c0

[242031.688255] RPC: 110 call_refreshresult (status 0)

[242031.688257] RPC: 110 call_allocate (status 0)

[242031.688263] RPC: xprt_rdma_allocate: size 6344 too large for buffer[1024]: prog 100003 vers 4 proc 1

[242031.688269] RPC: xprt_rdma_allocate: size 6344, request 0xffff88082b81c000

[242031.688271] RPC: 110 call_bind (status 0)

[242031.688273] RPC: 110 call_connect xprt ffff880846752000 is connected

[242031.688275] RPC: 110 call_transmit (status 0)

[242031.688277] RPC: 110 xprt_prepare_transmit

[242031.688279] RPC: 110 xprt_cwnd_limited cong = 0 cwnd = 4096

[242031.688281] RPC: 110 rpc_xdr_encode (status 0)

[242031.688283] RPC: 110 marshaling UNIX cred ffff881050ef20c0

[242031.688286] RPC: 110 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data

[242031.688290] RPC: 110 xprt_transmit(220)

[242031.688293] RPC: rpcrdma_inline_pullup: pad 0 destp 0xffff88082b81d83c len 220 hdrlen 220

[242031.688297] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccda90 to map 1 segments

[242031.688301] RPC: rpcrdma_create_chunks: reply chunk elem 3124@0x82b81e3f4:0xe006730f (last)

[242031.688305] RPC: rpcrdma_marshal_req: reply chunk: hdrlen 48 rpclen 220 padlen 0 headerp 0xffff88082b81d100 base 0xffff88082b81d760 lkey 0x8000

[242031.688309] RPC: 110 xmit complete

[242031.688311] RPC: 110 sleep_on(queue “xprt_pending” time 4355352568)

[242031.688313] RPC: 110 added to queue ffff880846752258 “xprt_pending”

[242031.688315] RPC: 110 setting alarm for 60000 ms

[242031.688318] RPC: wake_up_first(ffff880846752190 “xprt_sending”)

[242031.688333] RPC: rpcrdma_event_process: event rep ffff880845ccda90 status 0 opcode 8 length 4294936584

[242031.688340] RPC: rpcrdma_event_process: event rep (null) status 0 opcode 0 length 4294936584

[242031.688473] RPC: rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 48

[242031.688479] RPC: rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082b81c000

[242031.688479] RPC request 0xffff880853255600 xid 0x9b95eddf

[242031.688484] RPC: rpcrdma_count_chunks: chunk 308@0x82b81e3f4:0xe006730f

[242031.688486] RPC: rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 308)

[242031.688489] RPC: 110 xid dfed959b complete (308 bytes received)

[242031.688492] RPC: 110 __rpc_wake_up_task (now 4355352568)

[242031.688493] RPC: 110 disabling timer

[242031.688496] RPC: 110 removed from queue ffff880846752258 “xprt_pending”

[242031.688500] RPC: __rpc_wake_up_task done

[242031.688544] RPC: 110 __rpc_execute flags=0x881

[242031.688547] RPC: 110 call_status (status 308)

[242031.688549] RPC: 110 call_decode (status 308)

[242031.688551] RPC: 110 validating UNIX cred ffff881050ef20c0

[242031.688554] RPC: 110 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data

[242031.688560] RPC: 110 call_decode result 0

[242031.688563] RPC: wake_up_first(ffff881052515e98 “NFSv4.0 transport Slot table”)

[242031.688566] RPC: 110 return 0, status 0

[242031.688568] RPC: 110 release task

[242031.688570] RPC: wake_up_first(ffff880846752190 “xprt_sending”)

[242031.688573] RPC: xprt_rdma_free: called on 0xffff88085436a000

[242031.688580] RPC: rpcrdma_event_process: event rep ffff880845ccda90 status 0 opcode 7 length 4294936584

[242031.688586] RPC: 110 release request ffff880853255600

[242031.688588] RPC: wake_up_first(ffff880846752320 “xprt_backlog”)

[242031.688590] RPC: rpc_release_client(ffff881052517200)

[242031.688604] RPC: 110 freeing task

[242031.688657] RPC: new task initialized, procpid 12386

[242031.688660] RPC: allocated task ffff88085145fb38

[242031.688672] RPC: 111 __rpc_execute flags=0x1

[242031.688678] RPC: 111 call_start nfs4 proc READ (async)

[242031.688681] RPC: 111 call_reserve (status 0)

[242031.688686] RPC: 111 reserved req ffff880853255600 xid e0ed959b

[242031.688689] RPC: 111 call_reserveresult (status 0)

[242031.688692] RPC: 111 call_refresh (status 0)

[242031.688695] RPC: 111 refreshing UNIX cred ffff881050ef20c0

[242031.688697] RPC: 111 call_refreshresult (status 0)

[242031.688699] RPC: 111 call_allocate (status 0)

[242031.688705] RPC: xprt_rdma_allocate: size 684, request 0xffff88082bea8000

[242031.688707] RPC: 111 call_bind (status 0)

[242031.688709] RPC: 111 call_connect xprt ffff880846752000 is connected

[242031.688711] RPC: 111 call_transmit (status 0)

[242031.688713] RPC: 111 xprt_prepare_transmit

[242031.688715] RPC: 111 xprt_cwnd_limited cong = 0 cwnd = 4096

[242031.688717] RPC: 111 rpc_xdr_encode (status 0)

[242031.688719] RPC: 111 marshaling UNIX cred ffff881050ef20c0

[242031.688721] RPC: 111 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data

[242031.688724] RPC: 111 xprt_transmit(152)

[242031.688727] RPC: rpcrdma_inline_pullup: pad 0 destp 0xffff88082bea97f8 len 152 hdrlen 152

[242031.688731] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccd040 to map 1 segments

[242031.688734] RPC: rpcrdma_create_chunks: write chunk elem 4096@0x8460d0000:0xe006310f (more)

[242031.688737] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccda68 to map 1 segments

[242031.688739] RPC: rpcrdma_event_process: event rep ffff880845ccd040 status 0 opcode 8 length 4294936584

[242031.688744] RPC: rpcrdma_create_chunks: write chunk elem 152@0x82bea9974:0xe006720d (last)

[242031.688747] RPC: rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 152 padlen 0 headerp 0xffff88082bea9100 base 0xffff88082bea9760 lkey 0x8000

[242031.688749] RPC: rpcrdma_event_process: event rep ffff880845ccda68 status 0 opcode 8 length 4294936584

message from @Bill Broadley​ (continue)

[242031.688753] RPC: 111 xmit complete

[242031.688755] RPC: 111 sleep_on(queue “xprt_pending” time 4355352568)

[242031.688757] RPC: 111 added to queue ffff880846752258 “xprt_pending”

[242031.688759] RPC: 111 setting alarm for 60000 ms

[242031.688762] RPC: wake_up_first(ffff880846752190 “xprt_sending”)

[242031.688876] RPC: rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 112

[242031.688881] RPC: rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082bea8000

[242031.688881] RPC request 0xffff880853255600 xid 0x9b95ede0

[242031.688884] RPC: rpcrdma_count_chunks: chunk 4096@0x8460d0000:0xe006310f

[242031.688887] RPC: rpcrdma_inline_fixup: srcp 0xffff88085436a094 len 60 hdrlen 60

[242031.688890] RPC: rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 4156)

[242031.688892] RPC: 111 xid e0ed959b complete (4156 bytes received)

[242031.688894] RPC: 111 __rpc_wake_up_task (now 4355352568)

[242031.688896] RPC: 111 disabling timer

[242031.688898] RPC: 111 removed from queue ffff880846752258 “xprt_pending”

[242031.688900] RPC: __rpc_wake_up_task done

[242031.688905] RPC: 111 __rpc_execute flags=0x801

[242031.688907] RPC: 111 call_status (status 4156)

[242031.688908] RPC: 111 call_decode (status 4156)

[242031.688910] RPC: 111 validating UNIX cred ffff881050ef20c0

[242031.688913] RPC: 111 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data

[242031.688915] RPC: 111 call_decode result 0

[242031.688918] RPC: wake_up_first(ffff881052515e98 “NFSv4.0 transport Slot table”)

[242031.688921] RPC: 111 return 0, status 0

[242031.688922] RPC: 111 release task

[242031.688924] RPC: wake_up_first(ffff880846752190 “xprt_sending”)

[242031.688927] RPC: xprt_rdma_free: called on 0xffff88085436a000

[242031.688934] RPC: rpcrdma_event_process: event rep ffff880845ccd040 status 0 opcode 7 length 4294936584

[242031.688937] RPC: rpcrdma_event_process: event rep ffff880845ccda68 status 0 opcode 7 length 4294936584

[242031.688941] RPC: 111 release request ffff880853255600

[242031.688943] RPC: wake_up_first(ffff880846752320 “xprt_backlog”)

[242031.688945] RPC: rpc_release_client(ffff881052517200)

[242031.689159] RPC: new task initialized, procpid 12386

[242031.689162] RPC: allocated task ffff880845c32c00

[242031.689173] RPC: 112 __rpc_execute flags=0x81

[242031.689177] RPC: 112 call_start nfs4 proc CLOSE (async)

[242031.689179] RPC: 112 call_reserve (status 0)

[242031.689182] RPC: 112 reserved req ffff880853255600 xid e1ed959b

[242031.689184] RPC: 112 call_reserveresult (status 0)

[242031.689185] RPC: 112 call_refresh (status 0)

[242031.689188] RPC: 112 refreshing UNIX cred ffff881050ef20c0

[242031.689190] RPC: 112 call_refreshresult (status 0)

[242031.689192] RPC: 112 call_allocate (status 0)

[242031.689196] RPC: xprt_rdma_allocate: size 3244 too large for buffer[1024]: prog 100003 vers 4 proc 1

[242031.689201] RPC: xprt_rdma_allocate: size 3244, request 0xffff88082b81c000

[242031.689203] RPC: 112 call_bind (status 0)

[242031.689205] RPC: 112 call_connect xprt ffff880846752000 is connected

[242031.689207] RPC: 112 call_transmit (status 0)

[242031.689208] RPC: 112 xprt_prepare_transmit

[242031.689210] RPC: 112 xprt_cwnd_limited cong = 0 cwnd = 4096

[242031.689213] RPC: 112 rpc_xdr_encode (status 0)

[242031.689214] RPC: 112 marshaling UNIX cred ffff881050ef20c0

[242031.689217] RPC: 112 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data

[242031.689220] RPC: 112 xprt_transmit(160)

[242031.689223] RPC: rpcrdma_inline_pullup: pad 0 destp 0xffff88082b81d800 len 160 hdrlen 160

[242031.689226] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccc5f0 to map 1 segments

[242031.689229] RPC: rpcrdma_create_chunks: reply chunk elem 2760@0x82b81d944:0xe005ef0f (last)

[242031.689233] RPC: rpcrdma_marshal_req: reply chunk: hdrlen 48 rpclen 160 padlen 0 headerp 0xffff88082b81d100 base 0xffff88082b81d760 lkey 0x8000

[242031.689235] RPC: 112 xmit complete

[242031.689238] RPC: 112 sleep_on(queue “xprt_pending” time 4355352568)

[242031.689240] RPC: 112 added to queue ffff880846752258 “xprt_pending”

[242031.689241] RPC: 112 setting alarm for 60000 ms

[242031.689244] RPC: wake_up_first(ffff880846752190 “xprt_sending”)

[242031.689262] RPC: rpcrdma_event_process: event rep ffff880845ccc5f0 status 0 opcode 8 length 4294936584

[242031.689417] RPC: rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 48

[242031.689424] RPC: rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082b81c000

[242031.689424] RPC request 0xffff880853255600 xid 0x9b95ede1

[242031.689428] RPC: rpcrdma_count_chunks: chunk 132@0x82b81d944:0xe005ef0f

[242031.689431] RPC: rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 132)

[242031.689433] RPC: 112 xid e1ed959b complete (132 bytes received)

[242031.689436] RPC: 112 __rpc_wake_up_task (now 4355352568)

[242031.689438] RPC: 112 disabling timer

[242031.689440] RPC: 112 removed from queue ffff880846752258 “xprt_pending”

[242031.689445] RPC: __rpc_wake_up_task done

[242031.689452] RPC: 112 __rpc_execute flags=0x881

[242031.689454] RPC: 112 call_status (status 132)

[242031.689456] RPC: 112 call_decode (status 132)

[242031.689459] RPC: 112 validating UNIX cred ffff881050ef20c0

[242031.689461] RPC: 112 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data

[242031.689467] RPC: 112 call_decode result 0

[242031.689470] RPC: wake_up_first(ffff881052515e98 “NFSv4.0 transport Slot table”)

[242031.689474] RPC: 112 return 0, status 0

[242031.689476] RPC: 112 release task

[242031.689478] RPC: wake_up_first(ffff880846752190 “xprt_sending”)

[242031.689481] RPC: xprt_rdma_free: called on 0xffff88085436a000

[242031.689489] RPC: rpcrdma_event_process: event rep ffff880845ccc5f0 status 0 opcode 7 length 4294936584

[242031.689493] RPC: 112 release request ffff880853255600

[242031.689495] RPC: wake_up_first(ffff880846752320 “xprt_backlog”)

[242031.689497] RPC: rpc_release_client(ffff881052517200)

[242031.689523] RPC: 112 freeing task

Here’s the mount:

mount | grep rdma

10.18.3.58:/export/1 on /mnt/ib type nfs (rw,rdma,port=20049,vers=4,addr=10.18.3.58,clientaddr=10.18.3.31)