hwmon monitoring for mlx3 and mlx4


some of the NICs I use (most notably Solarflare NICs), do have this awesome feature,

that expose temperatures and voltages, using hwmon Linux infrastructure.

It makes it very easy to monitor some vital things of the NIC using sensors program.

I wish the Mellanox NICs had the same feature. The reason being

they often can run hot, and have only passive cooling, and

require airflow from the computer case. But without monitoring,

it is hard to determine if that is the case.

I am aware that there are tools, that can view the temperature.

The problem is they are not packaged in any distro,

and cannot be easily hooked to existing monitoring systems

(collectd, prometheus node_exporter), that already just use

hwmon automatically.

Hello Baryluk,

Thank you for posting your inquiry on the NVIDIA Networking Community.

Unfortunately, the only value we expose is the ASIC temperature, which you can read through the ‘mget_temp’ tool (provided by Mellanox Firmware Tools → https://www.mellanox.com/products/adapter-software/firmware-tools). In any other case, the adapter f/w will print temperature or voltage related in the system messages file when exceed the threshold.

Thank you and regards,

~NVIDIA Networking Technical Support

I know all this. I even said so in my original post.

Please expose temperatures and voltages using hwmon, like everybody else.