Mellanox ConnectX-4 missing sysfs hwmon entries

Dear Mellanox Support team,

I noticed on our dual-port 100G ConnectX-4 cards running the newest firmware version 12.28.2302 that I do not have a sysfs hwmon entry for reading the temperature value of the card on my systems.

On the other hand, our ConnectX-4 Lx cards running their newest firmware version 14.32.1900 have a sysfs hwmon entry.

I looked into the mlx5_core driver and noticed that it is doing a check whether a register value is set here. When I removed that check and recompiled the kernel module, a sysfs hwmon entry was also created for the non-Lx ConnectX-4 cards.

The register seems to be called MCAM. I noticed a bug fix regarding that register inside the ConnectX-4 Lx bug fix history here (Ref. 2339971).

This fix seems to be the solution to my problem. I could not find such a fix for the non-Lx ConnectX-4 cards. Is it missing for those card’s firmware?

Short update: This problem only occurs on recent Linux Kernel versions. I looked into past commits regarding mlx5 and the MCAM register and noticed this commit.
When I revert that commit, the hwom entry reappears again, and I can read the temperature values of our ConnectX-4 cards.