CPU Overhead is being utilized at maximum capacity over infiniband on file transfer applications

I am running a file transfer over Infiniband Ethernet and using Reliable Communication (RC) for data transfer. I am noticing that Infiniband CPU overhead is at maximum regarding large file transfers or any file transfer with large buffer size. Infiniband CPU overhead reaches 100% over a single thread and even distributing it on multiple threads creates the same scenario. Is it a usual behavior of InfiniBand applications?


Many factors can be related to performance related issues and CPU overhead (IE OS/Kernel/Nvidia driver/FW/- BIOS/OS/HCA tuning, tool(s) utilized for testing (ib_write/read/send_bw), server architecture, PCIe (Gen/Width) etc…

You can refer to our community site for articles about these corresponding tuning to start with as basics.
I would also recommend using our latest IB Nvidia driver/FW.

Are you checking the CPU utilization via top/htop? What tool(s) are you using to measure bandwidth and CPU utilization?

What type of ConnectX card?

Are you exhibiting the same issue with Datagram versus RC?

Your issue might need to be dissected if needed which will require to open a case with a valid support contract to further analyzed & debug as necessary.