Is it possible to transfer a large file (~25GB) between two computers using RDMA UD, write the file to disk, and have a bitwise copy of the original file on the receiving computer?

I have been trying to do this with the following setup:

Both computers are running RHEL7.6 with NICs Mellanox MCX515A-CCAT.

OFED version is 4.7 (taken from rpm filenames) and was installed with the mlnxofedinstall script.

I’ve been trying to transfer a 25GB file using 1 qp, thinking that the data would be serial.

The message size is 4kB.

I’ve been using the perftest program ib_send_bw as a starting point. It was recently downloaded from github. The client and server are run with the following commands. The program finishes ok.

client:

./ib_send_bw -R -d mlx5_0 -c UD -i 1 -F -q 1 --mmap=myfile --report_gbits -n 1000 10.10.10.3

server:

./ib_send_bw -R -r 50 -d mlx5_0 -c UD -i 1 -q 1 -F -n 1000 --report_gbits

The ctx->buf and ctx->buff_size have been increased to match the size of the file I am attempting to send.

On the client, the sg_list.addr is advaned by sg_list.length after post_send_func_pointer is called.

sg_l_tmp = ctx->wr[index*user_param->post_list].sg_list;

if (sg_l_tmp->addr < ctx->my_addr[index] + ctx->buff_size){

ctx->my_addr[index*user_param->post_list] += sg_l_tmp->length;

ctx->wr[index*user_param->post_list].sg_list->addr += sg_l_tmp->length;

}

Using print statements, I can see that the data in sg_l_tmp->addr match what I expect. I am printing out the first 100B of data in sg_l_tmp->ddr. I also write out the contents of ctx->buf to a file for later comparison.

On the server side, the data are being written to the same 4kB memory location over and over

again. After the call to ibv_post_recv, the first 140B (40B of header and 100B of payload) are written to std out for debugging. The full payload is also written out to ctx->buf which will be

written out to a file later.

Side question: Can I control where the are initially written on the server? If so, is that done on the client side (by advancing something like

ctx->wr[index*user_param->post_list].wr.rdma.remote_addr)

or on the server side (by advancing something like ctx->rwr[wc_id].sg_list->addr) ?

When I look at the 100B of payload for the first ~10 transfers and compare the payload from client and server, sometimes client and server agree exactly, sometimes they partially agree (for example, the first 25 or 50 bytes agree), and sometimes they don’t agree at all.

With the unmodified ib_send_bw code, it was sending the first 4kB of data over and over

again. By spot checking, the data appeared to be fine.

So I am wondering if the differences I am seeing between the payload on the client and the server are due to mistakes I am making or if what I am trying to do really can’t be done with RDMA UD.

I appreciate any thoughts and suggestions.

Thanks,

Terry