firmware-version: 20.29.2002 (MT_0000000225)
Feb 23 14:35:58 localhost kernel: mlx5_core 0000:03:00.0: temp_warn:170:(pid 0): High temperature on sensors with bit set 0 1
Feb 23 14:35:58 localhost kernel: mlx5_core 0000:03:00.1: temp_warn:170:(pid 0): High temperature on sensors with bit set 0 1
Feb 23 14:39:08 localhost kernel: mlx5_core 0000:03:00.1: print_health_info:421:(pid 0): synd 0x10: High temperature
Feb 23 14:39:08 localhost kernel: mlx5_core 0000:03:00.0: print_health_info:421:(pid 0): synd 0x10: High temperature
Thank you for posting your inquiry on the NVIDIA Networking Community.
Make sure when using an active cable, you use one of the validated and supported cables based on your installed f/w RN → https://docs.mellanox.com/display/ConnectX6Firmwarev20292002/Firmware+Compatible+Products
Also please make sure, the adapter and transceiver is getting adequate cooling by increasing the fan-speed of the system.
If you are using a validated/ supported cable and after increasing the fan speed of the system, you still are experiencing this issue, please open a support case (valid support contract needed), by sending an email to email@example.com
Thank you and regards,
~NVIDIA Networking Technical Support