SX6036 IPoIB speed issue

Hi,

I was hoping that someone would be able to help me.

I have 15 HP DL380 gen9 servers running Windows Server 2012 R2 each with dual port HP Connect-X3 VPI cards connected to a Mellanox SX6036 switch.

The line speed is correctly displayed as 32Gbps (QDR) but we are not getting anywhere near that performance. Real-world RDMA speeds seems to max out at 25Gbps (which is OK but could be better) but the maximum speed we seem to get with IPoIB is 1Gbps. Below is a ntttcp test:

c:\temp\NTttcp-v5.31\x64>ntttcp.exe -r -m 8,*,10.167.255.111 -rb 2M -a 16 -t 30

Copyright Version 5.31

Network activity progressing…

Thread Time(s) Throughput(KB/s) Avg B / Compl

====== ======= ================ =============

0 30.062 1896.880 65536.000

1 30.046 1414.365 63434.262

2 30.124 1459.567 64410.918

3 30.093 2473.399 65536.000

4 30.062 1847.914 65536.000

5 30.062 2452.531 65365.777

6 30.047 2726.406 63310.495

7 30.047 2890.476 61291.891

Totals:

Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)

================ =========== ============== ================

503.877392 30.069 3966.649 16.757

Throughput(Buffers/s) Cycles/Byte Buffers

===================== =========== =============

268.118 395.755 8062.038

DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)

============= ============= =============== ==============

81.845 54.124 5762.646 0.769

Packets Sent Packets Received Retransmits Errors Avg. CPU %

============ ================ =========== ====== ==========

7198 133199 2 6 15.137

We have a similar environment which is slightly different. 9 HP DL380 , Connect-X3 cards connected to a IS5022 switch. This environment performs as expected:

c:\temp\NTttcp-v5.31\x64>ntttcp.exe -s -m 8,*,192.168.84.10 -l 128k -a 2 -t 30

Copyright Version 5.31

Network activity progressing…

Thread Time(s) Throughput(KB/s) Avg B / Compl

====== ======= ================ =============

0 30.000 469060.267 131072.000

1 30.000 359970.133 131072.000

2 30.000 446084.267 131072.000

3 30.000 437909.333 131072.000

4 30.000 348608.000 131072.000

5 30.000 387993.600 131072.000

6 30.000 348654.933 131072.000

7 30.000 357444.267 131072.000

Totals:

Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)

================ =========== ============== ================

92452.875000 30.000 4037.074 3081.762

Throughput(Buffers/s) Cycles/Byte Buffers

===================== =========== =============

24654.100 0.614 739623.000

DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)

============= ============= =============== ==============

51664.567 1.719 72494.233 1.225

Packets Sent Packets Received Retransmits Errors Avg. CPU %

============ ================ =========== ====== ==========

24013397 2663789 15 6 6.892

The only major difference between the two environments is the switch. I’m pretty sure that the SX6036 is configured correctly but there must be something wrong if we are getting a throughput of 16MBps compared with 3081MBps!

Any help on this issue would be much appreciated. I can provide switch config and more details if required.

Thanks,

Zak

Hello Zak,

Check/Do the following:

  1. Make sure you have the latest Mellanox Driver/Firmware

  2. Make sure you are able to reach line rate speed with “nd_write_bw.exe” which comes with Mellanox WinOF( http://www.mellanox.com/page/products_dyn?product_family=32&mtag=windows_sw_drivers http://www.mellanox.com/page/products_dyn?product_family=32&mtag=windows_sw_drivers )

  3. Please consult our Performance tuning guide. http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf

  4. Perform the Ntttcp between two server back-to-back(without SX6036 switch), What are your performance results?

Cheers,

~R

Hi Rage,

Thanks very much for your reply. Apologies for my late response, it’s been a busy couple of weeks.

  1. The driver is 5.22.12433.0 and firmware 2.36.5000. This was the latest as of a couple of weeks ago.

  2. nd_write_bw results:

#qp #bytes #iterations MR [Mmps] Gb/s CPU Util.

0 1048576 22712 0.004 31.75 100.00

I’m not sure why the CPU is displayed at 100% as it is actually using about 4%.

  1. I have looked at the performance tuning PDF before and run the balanced tuning option in the driver configuration.

  2. This is the results when they are plugged into each other back to back (even worse!):

c:\temp\NTttcp-v5.31\x64>ntttcp.exe -s -m 8,*,192.168.100.11 -l 128k -a 2 -t 30

Copyright Version 5.31

Network activity progressing…

Thread Time(s) Throughput(KB/s) Avg B / Compl

====== ======= ================ =============

0 30.114 773.594 131072.000

1 29.551 60.641 131072.000

2 30.021 2890.776 131072.000

3 30.911 62.114 131072.000

4 30.036 762.818 131072.000

5 29.895 706.473 131072.000

6 31.536 60.883 131072.000

7 29.926 774.176 131072.000

Totals:

Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)

================ =========== ============== ================

178.625000 30.000 3903.587 5.954

Throughput(Buffers/s) Cycles/Byte Buffers

===================== =========== =============

47.633 327.094 1429.000

DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)

============= ============= =============== ==============

583.767 0.419 4805.100 0.051

Packets Sent Packets Received Retransmits Errors Avg. CPU %

============ ================ =========== ====== ==========

47982 7338 3356 6 4.445

As the switch has now been ruled out, I will also contact HP to see if they can offer some support.

Thanks again for help with this.

Zak