Throughput drops drastically while running DOCA Sample Application

I am trying to run the flow_acl sample application. I have modified the flow_acl_sample.c as per my setup.

My Setup, OVS Configuration, SF Configuration, Flow rules and experiment output are there in following pdf.

DOCA_Issue.pdf (93.6 KB)

I am not sure about the reason behind such a drastic throughput drop. With simple switch configuration iperf3 with the single client, I get around 45Gbps but with flow_acl I am getting around 400 Mbps.

The primary suspect is rules are not getting offloaded to NIC somehow. In my case, I have made sure to enable hw-tc-offload and ovs hw-offload. I have also tried with tc flower but no luck there as well. My general observation is when I use just single ovs bridge everything works perfectly fine but when I try using multiple ovs bridges my thoughput start dropping drastically.

Hi @rvarde2,
Do you find the reason? Where do you see 400 Mbps on the ph0hfp, p0 or one of the representor ports?

I have not found the solution yet. I am getting 400Mbps on host when traffi is flowing through DPU.

What frame size do you send?
Have you seen this dpdk report ?

Also please check list of available cores option, I think -l 0,1 is not enough for your case.

Hi there, giving more cores didn’t help increasing throughput. I think since we are statically offloading rules to nic, just 2 cores are sufficient. In my opinion my issue has something to do about how I am setting ovs setup with two bridges. Please take a look at shared pdf and let me know if I am making any mistake.

Thank you

Rohan

Could you please share the output of:

  • ovs-vsctl list Open_vSwitch
  • ovs-dpctl show
  • ovs-vsctl show

And could you please share these for both the “good” case and the “bad” case?

sudo ovs-vsctl list Open_vSwitch
_uuid : 16266ee6-e532-4a01-be71-b9ef78f31320
bridges : [175e0c71-1fea-469d-ab66-51e9f6783e85, 7539d5ab-5c80-4df2-b8b5-a15d4fbda862, d6742a20-466e-4659-b3a1-636de3869acb]
cur_cfg : 13
datapath_types : [netdev, system]
datapaths : {}
db_version : “8.3.1”
doca_initialized : false
doca_version : “2.2.0080”
dpdk_initialized : false
dpdk_version : “MLNX_DPDK 22.11.2307.2.0”
external_ids : {hostname=localhost.localdomain, rundir=“/var/run/openvswitch”, system-id=“63c35c00-7df8-4117-948b-f0f2fe5f3377”}
iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options :
next_cfg : 13
other_config : {hw-offload=“true”}
ovs_version : “2.17.8-3feee121f”
ssl :
statistics : {}
system_type : ubuntu
system_version : “22.04”

sudo ovs-dpctl show
system@ovs-system:
lookups: hit:0 missed:290073 lost:171
flows: 0
masks: hit:1 total:0 hit/pkt:0.00
caches:
masks-cache: size:256
port 0: ovs-system (internal)
port 1: ovsbr1 (internal)
port 2: pf0hpf
port 3: p0
port 4: en3f0pf0sf0
port 5: ovsbr2 (internal)
port 6: ovsbr3 (internal)
port 7: en3f0pf0sf1
port 8: en3f0pf0sf2

sudo ovs-vsctl show
16266ee6-e532-4a01-be71-b9ef78f31320
Bridge ovsbr3
Port p0
Interface p0
Port en3f0pf0sf2
Interface en3f0pf0sf2
Port ovsbr3
Interface ovsbr3
type: internal
Bridge ovsbr1
Port en3f0pf0sf0
Interface en3f0pf0sf0
Port ovsbr1
Interface ovsbr1
type: internal
Bridge ovsbr2
Port pf0hpf
Interface pf0hpf
Port ovsbr2
Interface ovsbr2
type: internal
Port en3f0pf0sf1
Interface en3f0pf0sf1
ovs_version: “2.17.8-3feee121f”

Sorry I didn’t understand what are you referring to with good and bad case. Apologies about late reply. Thanks a lot for helping me out.

Hi,

Can you please explain more the difference between the cases where you see 45 Gbps and the case where you see 400 Mbps? Where are you running flow_acl when you get 400 Mbps, and what is your simple switch configuration that you see 45 Gbps on? Do you see packets reaching the DPU’s kernel when running in the 400 Mbps case (and are any CPUs pegged?)

Hello, my simple switch configuration is default one as follow, where traffic flows between two hosts connected via DPU connected to respective hosts. The rule applied to switch is as follow: “ovs-ofctl add-flow ovsbr0 action=normal”

|319px;x319px;

When I try flow_acl, I am running the program on DPU which but the flow is still flowing between end-to-end hosts. More details are within the pdf file I shared when started this post chain.

Let me know what other details I can provide. I can share a pcap if you want(not sure how much useful it would be). Otherwise you can also share a script which will deploy configuration required for flow_acl on DPU. I can try to run it on my setup and then provide it’s output. for your debugging.

Thank you,

Rohan

I have also followed quick starter guide. Tried various combinations of setting up with OVS-DPDK as shown in attachment but still not getting expected results:


In your drawing, you have p0 and pf0hpf on ovsbr1. In your configuration, it seems like they are in different bridges… can you clarify what’s going on here?

Few more questions:

  • If you don’t run the DOCA flow_acl program, what does your performance look like?
  • On the ARM cores, if you take a tcpdump on p0 and pf0hpf, do you see the packets?
  • Can you please share your script you are using to configure OVS?
  • Can you please share your OVS logs?

Throughout my post, I talk about two configurations.

  1. Simple switch: This is a default configuration. The figure I shared today demonstrates this configuration. This will give 45Gbps with a single flow.
  2. Flow_ACL: This program is one of the samples provided with DOCA SDK. The configuration required for this application to run is inspired by this example: NVIDIA DOCA NAT Application Guide - NVIDIA Docs
    (In fact I have tried this NAT example but the result is the same as flow ACL)

When I checked with tcpdump with [this] on pf0hpf and p0 (https://global.discourse-cdn.com/nvidia/original/4X/6/d/7/6d7260588d1952c5d327e485369cd8a062146cb9.jpeg)
I don’t see any packets associated with the flow.

I have the interactive script that I have created for all the configurations I shared up until now.

If you can let me know what your ovs configuration looks like I can try to match it with that. If convenient we can set up a video call and I can run all configurations in front of you. Here is my email address: rvarde2@uic.edu

Thank you

Makes sense, thanks for explaining. I don’t have a setup handy to test out your exact config.

Few ideas that can help narrow this down:

  • can you runmlnx_perf and ethtool on your p0 while traffic is running to see if there are any drops?
  • if you disable OVS and directly run the DOCA app on p0 and pf0hpf, how does perf look?