Connectx-7 Perf drop when scaling from 2 to 4 VFs

I am working with a ConnectX-7 NIC using firmware version 28.47.1088 with FLEX_PARSER_PROFILE_ENABLE set to 3, connected via 2x200Gbps .

Using 512 byte packets I can achieve 88.8 Mpps (2xPF) and 88.4Mpps(2xVF) using testpmd with 16 cores and 16 queues.

When I try to scale the number of VFs from 2 to 4 my perf with 16 queues drops to 51.3 Mpps. Perf is limited by “rx_prio0_discards”.

./2511/dpdk/build/app/dpdk-testpmd -l 145-178 -n 4 -a c9:00.2 -a c9:00.3 -a c9:02.2 -a c9:02.3 --file-prefix=dut – -i -a --rxq=16 --txq=16 --txd=1024 --rxd=1024 --nb-cores=16

If i reduce the number of queues I can get better performance .

  • 12 queues per VF: 59.0 Mpps (14.75 Mpps per VF) , limited by “rx_prio0_discards”

  • 10 queues per VF: 65.6 Mpps (16.4 Mpps per VF) , limited by “rx_prio0_discards”

  • 8 queues per VF: 63.0 Mpps (15.75 Mpps per VF), limited by cpu/rx misses.

I running into some resource contention related to the number of queues per VF x number of VFs, has anyone else encountered this while scaling VFs on a Connectx-7?

If I use 2 VFs with 24 queues each the perf falls to 60 Mpps which is equivalent to the per for 4 x 12 queue VFs. Scaling above 32 queues appears to be the limitation.