Sniffer tool that captures high-speed RoCE traffic for Linux

We are trying to capture RoCE traffic moving at high speed (target speed is 50+GiB) over a mirrored port for debugging purposes. On Windows the mlx5cmd Sniffer worked ok, but it started losing packets at higher speeds and iteration counts. Since we knew that interoperability might be a problem, we switched to Linux.

The ibdump tool does not work (gives the error “command interface bad param”, and tcpdump only catches RoCE packets when we throttle the speed to 2.5GiB (and even then it’s losing packets). At any higher speed it only catches the TCP packets at the beginning and end

Is there a tool similar to mlx5cmd -Sniffer for Linux? Alternatively, is there source code available for mlx5cmd somewhere that I haven’t looked yet?

We are running on Red Hat using a Mellanox ConnectX-5 adapter, and we are currently using ib_send_bw to generate test traffic.

I would recommend to check few things

  • Latest Mellanox OFED v5.3 used on the tcpdump host
  • Latest version of tcpdump and libpcap is in use on tcpdump host
  • Close NUMA used to run tcpdump

On my setup, tcpdump doesn’t lose the packets and I’m running tcpdump on the same host as the ib_write_bw client. Speed is about 90 Gbps

[root@XXXX ~]# mlnx_perf -i ens1f0 | egrep [r]x_bytes

rx_bytes_phy: 11,361,121,028 Bps = 90,888.96 Mbps

rx_bytes_phy: 11,359,701,018 Bps = 90,877.60 Mbps

rx_bytes_phy: 10,054,267,618 Bps = 80,434.14 Mbps

rx_bytes_phy: 10,486,318,032 Bps = 83,890.54 Mbps

rx_bytes_phy: 11,351,984,490 Bps = 90,815.87 Mbps

rx_bytes_phy: 11,354,548,936 Bps = 90,836.39 Mbps

rx_bytes_phy: 11,352,971,872 Bps = 90,823.77 Mbps

[root@XXXX ~]# tcpdump -i mlx5_0 -etn -c 1000000 -s 65535 | head -n 5

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on mlx5_0, link-type EN10MB (Ethernet), capture size 65535 bytes

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

tcpdump: Unable to write output: Broken pipe

[root@XXXX ~]# tcpdump -i mlx5_0 -etn -c 1000000 -s 65535 | tail -n 5

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on mlx5_0, link-type EN10MB (Ethernet), capture size 65535 bytes

1000000 packets captured

1000000 packets received by filter

0 packets dropped by kernel

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

24:8a:07:9c:01:86 > 02:48:a0:79:c0:1a, ethertype IPv4 (0x0800), length 1082: 192.168.150.3.58171 > 192.168.150.4.roce: UDP, length 1040

Thank you for your reply,

We tried running tcpdump with your flags on the machine running the ib_send_bw client, and still lost a significant number of packets.

We’ve tried a couple of other things as well, including tuned (tune deamon) and setting the priority of tcpdump.

EDIT: What was your ib_send_bw line for the client? Correct me if I’m wrong, but if you don’t add an iterations value wouldn’t that mean it would run until you kill it, which would allow tcpdump to collect the requested number of packets even if it drops a few?