I am having trouble with latency on a Windows Server IB setup. I have read and tried many troubleshooting steps on various forums and the Mellanox Performance Tuning Guide but have had no luck in getting my setup to the low latency that I imagine the hardware is capable of. I am a complete beginner in the IB space so it is very possible I am missing some very basic config or concept.
I guess it is also possible that 0.2ms to 0.4ms ping latency and 2.5ms additional file access latency is to be expected, please let me know if my expectations were off.
Any advice would be greatly appreciated.
==== Hardware/Software Setup ====
HOST01:
Adapter: MCX354A-FCBT
Firmware: 2.40.7000
Motherboard: SuperMicro X8DT3
PCIe Slot: Gen2 8x (No Gen3 available)
OS: Windows Server 2016 DC
Mellanox Port Mode: IB
IP: 10.255.255.10
HOST02:
Adapter: MCX354A-FCBT
Firmware: 2.40.7000
Motherboard: SuperMicro X8DTN+
PCIe Slot: Gen2 8x (No Gen3 available)
OS: Windows Server 2016 DC
Mellanox Port Mode: IB
IP: 10.255.255.20
NETWORK:
Direct Connect (Back to Back): Mellanox MC2207130-002
==== VSTAT ====
Here are the vstat outputs for each host.
HOST01: COMMAND: vstat.exe
RESULT: https://pastebin.com/raw/aCk0Ltp4 https://pastebin.com/raw/aCk0Ltp4
HOST02: COMMAND: vstat.exe
RESULT: https://pastebin.com/raw/L0zWr7jk https://pastebin.com/raw/L0zWr7jk
==== Ping Times ====
The ping times seem very high (0.32ms to 0.45ms) In fact they are the same as my 1GB Ethernet connection.
HOST01: COMMAND: hrping.exe 10.255.255.20
RESULT: https://pastebin.com/raw/ubV0ZnsF https://pastebin.com/raw/ubV0ZnsF
HOST02: COMMAND: hrping.exe 10.255.255.10
RESULT: https://pastebin.com/raw/QVTYKkXQ https://pastebin.com/raw/QVTYKkXQ
**Note: ibping produces slightly better times of 0.19ms to 0.24ms
==== Throughput ====
The throughput seems “fine” I guess. I read various sources that said 56Gbps is limited to lower real world throughput for various reasons. In any case, I am not too concerned with throughput since I am focused on IOPS.
HOST01: COMMAND: ntttcp.exe -r -m 28,*,10.255.255.10 -rb 2M -a 16 -t 5
RESULT: https://pastebin.com/raw/RmQSBL2G https://pastebin.com/raw/RmQSBL2G
HOST02: COMMAND: ntttcp.exe -s -m 28,*,10.255.255.10 -l 512K -a 2 -t 5
RESULT: https://pastebin.com/raw/djsVFs8R https://pastebin.com/raw/djsVFs8R
==== IOPS ====
So, finally to my actual problem. I have a disk on HOST01 that locally has ~90K IOPS with 0.3ms latency but over the network it is down to ~10K IOPS and up to 3.0ms+ latency.
HOST01 (Local): COMMAND: diskspd.exe -b8K -d30 -o4 -t8 -h -r -w0 -L -Z1G -c20G x:\share\iotest.dat
RESULT: https://pastebin.com/raw/hikPjDQs https://pastebin.com/raw/hikPjDQs
HOST02 (Over IB): COMMAND: diskspd -b8K -d30 -o4 -t8 -h -r -w0 -L -Z1G -c20G \10.255.255.10\shared\iotest.dat
RESULT: https://pastebin.com/raw/e07Ajx1i https://pastebin.com/raw/e07Ajx1i