Persistent ESTABLISHED State Connections in Conntrack Table in HWOL Environment Post Load Testing

Hello,

I’ve been conducting load tests in a Hardware Offload (HWOL) Conntrack environment and have encountered an issue where, post-testing, all connections on the VM have terminated but the conntrack table still retains flows in the ESTABLISHED state.

Here’s a snippet of the conntrack table:

rubyCopy code

root@test02:/# conntrack -L | grep ESTA | head -n 10
tcp      6 85390 ESTABLISHED src=10.168.68.115 dst=10.168.66.221 sport=9172 dport=8080 src=10.168.66.221 dst=10.168.68.115 sport=8080 dport=9172 [ASSURED] zone=15 use=1
tcp      6 86092 ESTABLISHED src=10.168.68.118 dst=10.168.66.14 sport=35885 dport=8080 src=10.168.66.14 dst=10.168.68.118 sport=8080 dport=35885 [ASSURED] zone=2 use=1
tcp      6 85561 ESTABLISHED src=10.168.68.218 dst=10.168.66.221 sport=24124 dport=8080 src=10.168.66.221 dst=10.168.68.218 sport=8080 dport=24124 [ASSURED] zone=15 use=1
tcp      6 85721 ESTABLISHED src=10.168.68.153 dst=10.168.66.151 sport=40994 dport=8080 src=10.168.66.151 dst=10.168.68.153 sport=8080 dport=40994 [ASSURED] zone=13 use=1
tcp      6 85879 ESTABLISHED src=10.168.68.148 dst=10.168.66.151 sport=60456 dport=8080 src=10.168.66.151 dst=10.168.68.148 sport=8080 dport=60456 [ASSURED] zone=13 use=1
tcp      6 85397 ESTABLISHED src=10.168.68.143 dst=10.168.66.252 sport=8089 dport=8080 src=10.168.66.252 dst=10.168.68.143 sport=8080 dport=8089 [ASSURED] zone=11 use=1
tcp      6 85687 ESTABLISHED src=10.168.68.195 dst=10.168.66.19 sport=5139 dport=8080 src=10.168.66.19 dst=10.168.68.195 sport=8080 dport=5139 [ASSURED] zone=8 use=1
tcp      6 85786 ESTABLISHED src=10.168.68.170 dst=10.168.66.223 sport=36659 dport=8080 src=10.168.66.223 dst=10.168.68.170 sport=8080 dport=36659 [ASSURED] zone=12 use=1
tcp      6 85591 ESTABLISHED src=10.168.68.150 dst=10.168.66.204 sport=45455 dport=8080 src=10.168.66.204 dst=10.168.68.150 sport=8080 dport=45455 [ASSURED] zone=7 use=1
tcp      6 86087 ESTABLISHED src=10.168.68.215 dst=10.168.66.97 sport=14570 dport=8080 src=10.168.66.97 dst=10.168.68.215 sport=8080 dport=14570 [ASSURED] zone=6 use=1

root@test02:/# conntrack -L | grep -v HW_OFF | awk '{print $4}' | sort | uniq -c                                                                                                                                                                                                          
conntrack v1.4.6 (conntrack-tools): 103694 flow entries have been shown.
      2 CLOSE_WAIT
 103589 ESTABLISHED
      2 src=10.168.66.151
      2 src=10.168.66.204
      1 src=10.168.66.221
      1 src=10.168.66.252
      2 src=10.168.66.97
     58 SYN_SENT
     17 TIME_WAIT

The current netfilter conntrack table size is set to 1048576 and during the load test, the conntrack table size did not exceed 800K.

Here are the details of my setup:

Hardware

  • Server: Dell R7615
  • CPU: AMD Epyc 9654P
  • Memory: 384GB
  • NUMA: 1
  • NIC: Connect-X 6LX

Software Versions

  • OS: Ubuntu 22.04.2 LTS
  • Kernel: 5.15
  • Openstack Version: Yoga
  • OVN: 22.03
  • OVS: 2.17.5
  • MLNX OFED Driver: 5.8-2.0.3
  • Firmware: 26.35.1012 (DEL0000000031)

Any insights or suggestions on how to resolve this issue would be greatly appreciated.

Thank you.

Hi @kyoon,

There was a kernel bug that the connections are not aged. Fix was done in v5.18.

Regards,
Chen

Hi @Chen,

Thank you for your response and for shedding light on the issue. I appreciate your help.

I’ve looked into the patch history for kernel 5.18 regarding conntrack, but I couldn’t find the specific bugfix commit you mentioned. Could you possibly share the commit details?

I am currently using Ubuntu 22.04 (Jammy) and I’m planning to inquire about whether this fix is scheduled to be applied to this distribution. Having this information would be extremely helpful.

Thanks again for your assistance.

Best regards,
kyoon