The performance of event APIs could be bounded by softirqs

I’m trying to use RDMA event mode for handling many connections. I found that the performance degraded when there were many send requests under event mode. This issue can be reproduced by using perf-test.

What I observed is that only 10 clients can easily make ksoftirqd thread busy and there is only 1 CPU core handling interrupts, which could be the performance bottleneck.

This issue can be reproduced with GitHub - linux-rdma/perftest: Infiniband Verbs Performance Tests

pkill ib_send_lat

To launch server:

for i in $(seq 1 10); do
./ib_send_lat -e -n $N_ITER -p $port &

To launch client:

for i in $(seq 1 10); do
./ib_send_lat $CLIENT_IP -e -n $N_ITER -p $port &

And then, use htop to monitor server-side:

Linux Distribution:

LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.8.2003 (Core)
Release: 7.8.2003
Codename: Core
Linux Kernel and Version:
Linux gpu01.cluster 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

InfiniBand hardware and firmware version:

driver: mlx5_core[ib_ipoib]
version: 5.0-2.1.8
firmware-version: 16.21.2010 (MT_0000000010)
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

NIC: Mellanox Technologies MT27800 Family [ConnectX-5]

Hello and thank you for contacting us.

Looking at the information you shared it looks like a deeper debug than what we can provide in the community is needed here.
I would advice on opening a case VIA the support portal so our support engineers can look at this issue.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.