Unstable operation of WinOFED 4.40.0 with HCA fw 2.30.3200 on Win2012 Server

Hi,

I’ve encountered an issue of unstable operation of WinOFED 4.40.0 with HCA firmware 2.30.3200 on Windows 2012 Server.

Last week I’m going mad trying to get it working well.

I have a system of two Mellanox Infiniscale-IV IS5023 switches, three hosts running Windows 2012 Server / Windows 2012 Server R2 and two hosts running Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI Mellanox network cards ( MCX354A-QCBT ). Each host is plugged into both switches, swithes are connected to each other and use shared fabric. All hosts are based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667 v2 CPUs and DDR3-1866 memory.

WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server 2012R2. Both linux hosts are routers MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB interfaces are joined into active-backup bond by ifenslave means. Following modules are loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad, ib_ipoib, ib_uverbs. All HCAs are burned with 2.30.3200 firmware.

As test I use L3 icmp ping. In case of Linux - Linux communications all is fine. I’m doing flood ping through Infiniband network with amazing results: rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when it goes to Windows.

Flood ping from linux host to w2012 gives almost same good latency numbers (rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate is always about 1-2%. At same time, IBping shows no packet loss at all and ibdiagnet on Linux show no warnings or errors, so I conclude IB works good and issue exist higher than L2.

So I’ve decided to try Win2012R2 with 4.55 OFED version. It resolved issue with packet loss, but also gave latency growth: rtt min/avg/max/mdev = 0.093/0.102/17.550/0.101 ms. Digging this issue I’ve found that in other system I have no issues like it, and the difference is a firmware version of HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But downgrading to fw 2.11.500 on my servers didn’t help, with all same versions of fw and software still I see packet loss.

Still I want it all together - low latency, no packet loss, and latest software and firmware versions.

Running out of ideas about it, any comments and advises are appreciated.

Hell, Rian.

I’ve tried installing it already, but it didn’t help. You can see some statistics in my comment above.

Hi,

I would recommend to check without bonding in the windows side.

Did you check the port counters, are there any errors?

Hello, Rian.

Thank you for comment. There is no bonding on windows side configured. There are no port counters errors, ibdiagnet show no errors at all.

So far recently I’ve installed new WinOF and firmware. So things are going better now, but still I have some issues with packet loss. As before, linux to linux communications are good and linux to windows are not. I’ve ran few flood ping tests to understand how does latency and loss rate depend on WinOF and fw versions. Here is test summary:

Win 2012, WinOF 4.40, fw 2.30.3200

2156359 packets transmitted, 2127781 received, 1% packet loss, time 471502ms

rtt min/avg/max/mdev = 0.032/0.043/3.439/0.019 ms, ipg/ewma 0.218/0.043 ms

Win 2012, WinOF 4.40, fw 2.30.8000

197278 packets transmitted, 193214 received, 2% packet loss, time 57419ms

rtt min/avg/max/mdev = 0.020/0.022/3.742/0.026 ms, ipg/ewma 0.291/0.022 ms

Win 2012, WinOF 4.60, wf 2.30.8000

45086 packets transmitted, 44350 received, 1% packet loss, time 10878ms

rtt min/avg/max/mdev = 0.021/0.025/2.431/0.024 ms, ipg/ewma 0.241/0.023 ms

This case I had low packet loss rate and average latency.

Win 2012 R2, WinOF 4.60, fw 2.30.8000

699589 packets transmitted, 698741 received, 0% packet loss, time 57022ms

rtt min/avg/max/mdev = 0.041/0.051/12.418/0.073 ms, ipg/ewma 0.081/0.048 ms

The only case I had no packet loss, but high latency.

Win 2012 R2, WinOF 4.45, fw 2.30.8000

347256 packets transmitted, 347256 received, 0% packet loss, time 39444ms

rtt min/avg/max/mdev = 0.081/0.098/13.205/0.087 ms, pipe 2, ipg/ewma 0.113/0.097 ms

In short: WinOF 4.45 provides best reliability, while WinOF 4.40 provides best latency.

Still looking forward to find solution to combine it into one.

Hi Alexey,

I would recommend trying the new WinOF release 4.60.

Let me know if that helps.