High ksoftirqd load on Mellanox ConnectX-6 DX on Linux

norg_dev · February 8, 2024, 1:19pm

I’m currently trying to get the 100G Mellanox ConnectX-6 DX working with Suricata on Debian Bookworm and when I send around 70Gbit/s of traffic via Cisco T-Rex from another machine I see a big spike on 63 ksoftirqd processes that stay at 100% once I set the link up via ip link set eth5 up.

I also ran perf top on that process and saw first that __nf_conntrack_alloc showed a big overhead, so I disabled iptables and conntrack (which I don’t need for the test) and now the overhead is mostly (65%) with native_queued_spin_lock_slowpath and I struggle to get an idea why the load is so high for the IRQs.

Some basics on the system for reference:

CPU (disabled HT)

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         52 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  256
  On-line CPU(s) list:   0-255
Vendor ID:               AuthenticAMD
  BIOS Vendor ID:        AMD
  Model name:            AMD EPYC 9754 128-Core Processor
    BIOS Model name:     AMD EPYC 9754 128-Core Processor                 CPU @ 2.2GHz
    BIOS CPU family:     107
    CPU family:          25
    Model:               160
    Thread(s) per core:  1
    Core(s) per socket:  128
    Socket(s):           2
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-127
  NUMA node1 CPU(s):     128-255

NIC

e1:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
e1:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

ethtool -i eth5
driver: mlx5_core
version: 6.1.0-17-amd64
firmware-version: 22.36.1010 (DEL0000000027)
expansion-rom-version:
bus-info: 0000:e1:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Kernel and boot parameter

BOOT_IMAGE=/boot/vmlinuz-6.1.0-17-amd64 root=/dev/mapper/root-root ro iommu=pt processor.max_cstate=0 numa_balancing=disable intel_pstate=disable intel_idle.max_cstate=0 quiet splas

I tried to disable all the offloading and I also ran the affinity script

set_irq_affinity_cpulist.sh 128-189 eth5

The cores matche the 63 queues the NIC provides on that interface/port. (Side note, why the hack does it have 63 queues and not a multiple of 2 :p?)

I can see the traffic being received and all that, but I’m wondering why the ksoftiqrd peaks at 100% all the time. Is 1Gbit/s per queue/core just too much for that NIC? But how should it handle 100G with the 63queues otherwise?

Thanks

sribhargavid · February 13, 2024, 5:57pm

Hi @norg_dev,

Thank you for posting your query on our community.

The ‘ksoftirqd’ process is a kernel thread allocated per CPU to manage heavy soft-interrupt loads. It is not consuming your CPU, rather it’s helping manage your IRQ load more efficiently. For a better understanding, you might find this link useful:

To further tune your server, we recommend following this article - ESPCommunity
Ensure that the RAM slots on the system board are fully populated. We have noticed that there is performance improvement when all DIMMs are populated when working with AMD processors.

Lastly, I notice that your adapter is showing a Dell PSID. I recommend reaching out to Dell support for further assistance.

Thanks,
Bhargavi

norg_dev · February 20, 2024, 10:12am

Hi,

thanks for your reply!

I already read more about the ksoftirqd and also saw the post you linked. But what makes me wonder is that the usage is that high with the Mellanox card, while the same traffic being forwarded to the same box but received by an Intel E810 (100G) NIC doesn’t show this. I can also see the clear diff when I run perf top -g on the system. Thus I would narrow it down to some diff in how it is handled by the Mellanox NIC (and/or the driver/firmware).

So I’m wondering if there is some specific different setting needed for the Mellanox driver or even the firmware.
Also the affinity script might vary.

Yes it’s in a DELL system, I already got the latest firmware from Dell to make sure I’m running on the latest release now.

I also installed the OFED drivers, is there any way to tell Linux to use those instead of the builtin kernel one?
Or is there another interface name instead of eth5 that should appear?

Topic		Replies	Views
High ksoftirqd load on Mellanox ConnectX-4 Lx on Linux Mellanox OFED	1	161	July 30, 2024
One core 100% IRQ some times Ethernet Adapter Cards	30	2837	March 30, 2023
Mellanox ConnectX-5 UDP Test and 100% load on a single CPU core Ethernet Adapter Cards	1	14	June 2, 2025
Cannot achieve 100Gbps with MCX416A-CCAT? Ethernet Adapter Cards	0	355	September 3, 2015
How push full 100gbps port? Ethernet Adapter Cards kernel , ubuntu	7	3797	May 3, 2024
Dell Z9100-ON Switch + Mellanox/Nvidia MCX455-ECAT 100GbE QSFP28 Question Ethernet Switches	8	923	January 12, 2025
ConnectX6 (mlx5 kernel driver) strange behavior? Ethernet Adapter Cards kernel , ubuntu	2	3188	September 14, 2022
Uneven load softirq Ethernet Adapter Cards	3	1025	October 27, 2022
Heavy network traffic causes high CPU usage in ksoftirq Jetson TX2	11	5846	October 18, 2021
MT28908 Family [ConnectX-6] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x63, len: 128 Ethernet Adapter Cards	3	780	June 9, 2021

High ksoftirqd load on Mellanox ConnectX-6 DX on Linux

Related topics