Hello NVIDIA Team and Community,
I am currently working with an NVIDIA ConnectX-6 (MT28908) network adapter on a RHEL 9.7 system (Kernel 5.14). I have already compiled and set up the DPDK environment for high-performance testing.
I would like to monitor the real-time temperature of the adapter during high-load stress tests to prevent overheating and ensure stability.
My Questions:
-
Besides
lm_sensors, what is the recommended official NVIDIA tool to monitor the ASIC core temperature in real-time? -
How can I monitor the Optical Module (QSFP/Transceiver) temperature if the card is being used by DPDK?
-
Are there any specific DNF/YUM packages or MFT components I should check to ensure I have all the diagnostic tools (like
mlxlinkormxtool)?