Where's the procedure of packing network protocol header in RoCE v2?

Hi, lately I began to study the driver’s source code of MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.

When proceeding to the Network protocol stack, I met some problem,hoping for some guide from friends in the community.

Please let me show the question:

Now I’m using verbs api(black line at the below pic 2) and familiar with all its procedure(reading source code and mannual) but the abstract layer below it is not familiar.

So I want to know:

How is the RoCE v2 packing udp and ip header into the packet?(uh…I’m meaning where it’s done,because I haven’t found relevant code about it,but do have some clue,seeing below).And I’m not sure if this procudure is done by this driver or by system network stack.Somebody know it?Very pleasure to learn from you!

1.source code from MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64/MLNX_OFED_SRC-4.0-2.0.0.1/SRPMS/libmlx5-1.2.1mlnx1/src/mlx5.c:

2.some explanation of RoCEv2

  1. RoCEv2 packet format

Thanks for helping.

Following your indication,I locate the prototype of ibv_modify_qp in kernel layer and it do set some attributes of the QP context and encapsulate the udp sport into the path attributes of QP

but there are some extra questions that confuse me.May I ask you for advices?

About how to select src udp port for qp,because the documents say when using Reliable Connection RDMA (RC) the Source UDP port is scrambled per QP. https://community.mellanox.com/s/article/roce-v2-considerations

But actually,it seems not so clearly for “scramble”.

I did a simple expriment to verify the theory.

exp 1.create 1 qp to transfer data,capture the network packets.

exp 2.create 10 qp to transfer data,capture the network packets.

exp 3.create 100 qp to transfer data,capture the network packets.

Result:

exp 1:all packets have the same src udp port.

exp 2:all packets have the same src udp port.

exp 3:There exist 3 src udp port.

**Guess:**RoCEv2 load balance influences the port selection.

**Questions:**How port selection is done?

The following ref is I got from the driver source code,it shows the procedure of setting attributes of qp.

The setting udp src port procedure is a hint-like method(I feel it).It really arrests me.Thanks for taking time to view.

Hi, haonan qiu https://community.mellanox.com/s/profile/0051T000008EaJFQA0

we use -q = 6 , and we found the Queue pair value varies from 275~280 (0x000113-0x000118), however, there is only one udp port which is 49152. Does that means all the queue pairs mapped to only one udp port?

Hi Haiying.

It seems that there is nothing wrong.

I have tested several simple experiments for it and find something very strange.

First, I test 20 QPs by ib_write_bw -d mlx5_0 -D 10 -q 20, and check the packets in Wireshark. It gives me all the same UDP src port: 53248.

Then, I test 2 QPs with the same tools and check the packets in Wireshark. It gives me all the same UDP src port:53248 again.

Next, I test 2 QPs by ib_send_bw -d mlx5_0 -D 2 -q 2, and check the packets in Wireshark. It gives me the same result…

Things seem confusing.

But I test the last one. By ib_send_bw -d mlx5_0 -q 10 -b, things seem to be normal finally.

Result as following: there exist 4 different UDP src port (a section of the total result)

For excluding the bidirectional effect, I do another test without -b

Result as following: there still exists four different UDP src ports.

For excluding the difference between SEND and WRITE, another experiment…

Result as following:

Finally, It seems that there is a QP-Port mapping algorithm to allocate port for QP according to the system load(I guess)

Because at the beginning there is only one port for network communication but as the test numbers increase it gradually to allocate more ports for communication(Something like a warmup procedure).So you may need to try more experiments…

What kind of card you use ? ConnectX-4 ?

Hello, this is the detail of my device. Two devices are the same.

CA ‘mlx5_0’

CA type: MT4119

Number of ports: 1

Firmware version: 16.20.1010

Hardware version: 0

Node GUID: 0x248a070300b59626

System image GUID: 0x248a070300b59626

Port 1:

State: Active

Physical state: LinkUp

Rate: 100

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x04010000

Port GUID: 0x268a07fffeb59626

Link layer: Ethernet

IP and UDP headers are encapsulated by hardware.

Each QP has its own context. After QP is created, software set the attributes of the QP context by ibv_modify_qp. These attributes are used by hardware to encapsulate the headers.

It is expected behavior.

HCA can support up to 16 millions QPs. But there are only up to 64K UDP port number, and some UDP port was reserved for some specific application. So it is not 1:1 mapping between QP and src UDP port, but N:1 mapping. Always, a couple/group of QPs share one src UDP port. The main purpose of using scrambled src UDP port is for load balance in network. N:1 mapping works for that.

Hi, HuiLi.

Your experiment shows that in that case, all QP are mapped to only one UDP port.

According to our experiments, the mapping method is dynamic and changes with some factors(Network load and QP creation time I guess).

If you want to see it, you may do more experiments and change the QP num or run several sample process individually at the same time and capture the packets.

There was a bug when driver/firmware generate UDP source port. Could you please try the latest mlnx_ofed-4.1?

Thanks for replying.

After updating to the latest mlnx_ofed-4.1,it seems working normaly.

But…I do more expriements.

exp 1.create 1 qp to transfer data,capture the network packets.

exp 2.create 2 qp to transfer data,capture the network packets.

exp 3.create 5 qp to transfer data,capture the network packets.

exp 4.create 10 qp to transfer data,capture the network packets.

exp 5.create 20 qp to transfer data,capture the network packets.

Result:

exp 1:all packets have the same src udp port.

exp 2:There exist 2 src udp port.

exp 3:There exist 5 src udp port.

exp 4:There exist >5 src udp port.(file size too large,can’t capture all the packets)

exp 5:There exist >8 src udp port.(file size too large,can’t capture all the packets)

it seems when number of qp increases the number of udp src port doesn’t always increases with it.

Or the scramble mechanism is not the one-to-one mode rather than the multiplexing?


update:

exp4 && exp5 There doesn’t exit equal number of src udp port as number of queuepair no matter how long the sniffer has captured the network packets.

Hey haonan,

I am having similar issue, do you hava any advice ? From my side, i tried to assign a number to -q parameter , However, i could’t not be abole to get different udp source ports, only one (49152) instead. I am using newest MLNX-OFED 4.2 .

For that source file mlx5.c, do i need to compile or like edit it to make it work ?

–haiying

No need to modify that source file.

Things may need to be clear.

Which device do you use?ConnectX-5?

Do you use the RoCEv2 protocol?

How do you identify that there only exists port 49152?

I try to assign several different numbers of QP to generate packets for about 30s, and capture and save them to file by TCPDUMP tool at the same time.

Then I analyze them in Wireshark by UNIQUE the UDP source port and find there exist several different source ports.

Hope useful to you.

Thanks for your response!

I am using MLNX_OFED_Linux 4.2 and yes RoCEv2, by using ib_write_bw to test RDMA performance, here i assigned -q a number, 20 for example, and capture and save them to file by TCPDUMP tool. In Wireshark i see UDP source ports are the same 49152, so i want to know how could i get different source ports. Do i need to modify any files like that mlx5.c? or any other additional configurations?

Really appreciate!

–Haiying