Mellanox perftest cpu 100%,Why so high?

I have a mellanox 455A 100G network card, tested by perftest ib_read_bw, etc., using htop, top found that the cpu has a core of 100%, my two hosts are a processor, one is 6 core, one is 4 core, when using the --cpu_util parameter, A 6-core host cpu is 16.8% and a 4-core host cpu is 25%. Will perftest pull the core up to 100% and then divide by how many cores? Why is the cpu so high? Shouldn’t the rdma cpu be low?

Hi 1364236361,

Welcome, and thank you for posting your inquiry to the NVIDIA Developer Forums!

–cpu_util will report the correct cpu utilization. In htop, refer to the red color to see how much kernel space is being used.

See the following link in our Enterprise Support Portal for more details on our MLNX_OFED-based perftest:

Or this page on Github for upstream/opensource perftest documentation:

Relevant points in this documentation:

  • The benchmarks use the CPU cycle counter to get time stamps without context switching.

  • The benchmarks generate a synthetic stream of operations, which is very useful for hardware and software benchmarking and analysis. The benchmarks are not designed to emulate any real application traffic.

Breaking that down - in these perftests, data is synthetically generated by the CPU and CPU cycle counts are used as a performance metric.

In real-life applications, data will be generated for transmit based on the application (IE: GPU compute with MPI, offloaded storage traffic, etc). The actual Tx/Rx of this data bypasses the kernel and CPU by directly reading/writing/operating on memory using RDMA verbs.

NVIDIA Enterprise Experience

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.