Nvfancontrol strange bahavior

Hello,

I noticed that nvfancontrol is not properly working on the devkit and out custom designed board when using the Orin Nano.

With “nvfancontrol --verbose” I see the following temperatures:

FAN1: avgTemp: 56948, Current PWM = 64, Current RPM = 1344
FAN1: avgTemp: 56954, Current PWM = 64, Current RPM = 1344
FAN1: avgTemp: 56947, Current PWM = 64, Current RPM = 1344
FAN1: avgTemp: 56873, Current PWM = 64, Current RPM = 1344
FAN1: avgTemp: 57011, Current PWM = 64, Current RPM = 1344

Tegrastats shows me a completely different CPU temp:

03-27-2023 19:13:18 RAM 826/7473MB (lfb 1539x4MB) SWAP 0/3736MB (cached 0MB) CPU [0%@1510,7%@1510,4%@1510,3%@1510,2%@1510,3%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624,0] VIC_FREQ 435 APE 200 CV0@-256C CPU@49.531C SOC2@47.906C SOC0@46.093C CV1@-256C GPU@48.812C tj@49.531C SOC1@47.75C CV2@-256C VDD_IN 4952mW/4952mW VDD_CPU_GPU_CV 949mW/949mW VDD_SOC 1426mW/1426mW

Doing the same on the board with a Xavier NX I see the same temperature with nvfancontrol and tegrastats:

FAN1: avgTemp: 43450, Current PWM = 130, Current RPM = 0
FAN1: avgTemp: 43100, Current PWM = 130, Current RPM = 0
FAN1: avgTemp: 43300, Current PWM = 130, Current RPM = 0
~# tegrastats
10-10-2023 09:03:50 RAM 4260/7519MB (lfb 442x4MB) SWAP 0/3760MB (cached 0MB) CPU [27%@1420,17%@1420,46%@1420,55%@1420,11%@1420,24%@1420] EMC_FREQ 2%@1866 GR3D_FREQ 0%@[1109] VIC_FREQ 601 APE 150 AUX@42.5C CPU@44C thermal@43.1C AO@42C GPU@43C PMIC@50C VDD_IN 6780mW/6780mW VDD_CPU_GPU_CV 2128mW/2128mW VDD_SOC 1653mW/1653mW

The behavior with Orin is that the Fan sometimes runs at 0 and sometimes at 255. It does not seem to start spinning after boot anymore. nvfancontrol does always write the same value. Stopping nvfancontrol and writing directly to the pwm causes the fan to spin different.

Where do I set which temperature sensor is used to control nvfancontrol?

Hi,

Please rely on tegrastats for temperatures instead of nvfancontrol.

What exactly is the issue here?
What’s your fan profile currently? You mean you can only control the fan by directly writing PWM?

The issue is that when I boot the Orin Nano on our Custom Board, the nvfancontrol does not seem to control the fan correctly.

Yesterday I only saw the system running the fan at 0 or at 255 after booting.
The value never changed. That might be caused by something different.

Today I started nvfancontrol and was able to watch the pwm value being increased from 128 to 255 gradually, while the temp in nvfancontrol started at 62791 and ended at 67173.

After reaching 255 I get the following message:

NVFAN ERROR: FAN1: Cannot turn the fan on even the PWM is set to MAX PWM (255), please check if the fan is uninstalled or faulty.
FAN1: avgTemp: 67180, Current PWM = 255, Current RPM = 0

Did our Hardware team implement something wrong so that the RPM cannot be read?
With Xavier NX the same custom board works with proper fan control.

Tegrastats shoed this at start pwm 128:

10-11-2023 04:20:03 RAM 812/7472MB (lfb 1546x4MB) SWAP 0/3736MB (cached 0MB) CPU [13%@1510,36%@1510,6%@1510,4%@1510,4%@1510,6%@1510] EMC_FREQ 1%@2133 GR3D_FREQ 0%@[624] VIC_FREQ 435 APE 200 CV0@-256C CPU@43C SOC2@41.781C SOC0@40.156C CV1@-256C GPU@42.375C tj@42.968C SOC1@40.937C CV2@-256C VDD_IN 5458mW/5458mW VDD_CPU_GPU_CV 1107mW/1107mW VDD_SOC 1463mW/1463mW

And this at pwm 255:

10-11-2023 04:30:33 RAM 866/7472MB (lfb 1526x4MB) SWAP 0/3736MB (cached 0MB) CPU [22%@1510,4%@1510,13%@1510,2%@1510,2%@1510,7%@1510] EMC_FREQ 1%@2133 GR3D_FREQ 0%@[624] VIC_FREQ 435 APE 200 CV0@-256C CPU@38.875C SOC2@37.812C SOC0@36.125C CV1@-256C GPU@38.562C tj@38.875C SOC1@36.937C CV2@-256C VDD_IN 5339mW/5220mW VDD_CPU_GPU_CV 1028mW/929mW VDD_SOC 1463mW/1463mW

Why does the PWM get increased while the temperature drops?

FAN1:FAN_PROFILE:quiet
FAN1:FAN_GOVERNOR:cont
FAN1:FAN_CONTROL:close_loop
#
# Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.
#

POLLING_INTERVAL 2

<FAN 1>
        TMARGIN ENABLED
        FAN_GOVERNOR cont {
                STEP_SIZE 10
        }
        FAN_CONTROL close_loop {
                RPM_TOLERANCE 100
        }
        FAN_PROFILE quiet {
                #TEMP   HYST    PWM     RPM
                0       0       255     6000
                10      0       255     6000
                11      0       187     4000
                31      0       187     4000
                70      0       0       0
                105     0       0       0
        }
        FAN_PROFILE cool {
                #TEMP   HYST    PWM     RPM
                0       0       255     6000
                35      0       255     6000
                70      0       0       0
                105     0       0       0
        }
        THERMAL_GROUP 0 {
                GROUP_MAX_TEMP 105
                #Thermal-Zone Coeffs Max-Temp
                CPU-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
                GPU-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
                SOC0-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
                SOC1-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
                SOC2-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
        }
        FAN_DEFAULT_CONTROL close_loop
        FAN_DEFAULT_PROFILE quiet
        FAN_DEFAULT_GOVERNOR cont
        KICKSTART_PWM 64

Hi,

Was the fan still running at the time? Or both the fan stopped and RPM could not be read?
Are you able to reproduce it on a DevKit board?

The fan is running like the PWM value in nvfancontrol shows.

In the devkit the behavior is different as it is using a different temperature.
But it does not seem to run as it should either.

I changed the link in /etc/nvfancontrol.conf from nvfancontrol_p3767_0000.conf to nvfancontrol_p3668.conf and nvfancontrol suddenly shows the correct temperature and controls the fan with either 0 or 130 pwm.

Maybe the nvfancontrol_p3767_0000.conf is somehow faulty?

Hi,

Can you be more specific about this?

Is this issue reproducible on a DevKit board or not?

I think I’ve found the issue. For the 3767 the config includes FAN_DEFAULT_CONTROL close_loop.
Since I don’t get any RPM values on my custom board, the control fails according to the dev guide.
https://docs.nvidia.com/jetson/archives/r35.1/DeveloperGuide/text/SD/PlatformPowerAndPerformance/JetsonOrinNxSeriesAndJetsonAgxOrinSeries.html#fan-control

Do I need to change the device tree in order to measure the RPM values on the custom board?

Sounds like you should do it first.

Do you have any documentation for that? I’ve no idea why the board would not read the RPM value.

How did you confirm it?

The nvfancontrol shows the rmp at 0.
Unfortunately our hardware design team used the FAN_TACH pin for the SDMMC_CD.
Our FAN_TACH is connected to the PT.07. I don’t have an option to use the PT.07 for FAN_TACH in the pinmux. Is there any way to make it work anyway?

I think you need to either alter your hardware design or use another fan config file to avoid the need of reading RPM.

I was afraid you’ll say that. We use another config file now since your devguide says close loop control with the rpm involved is not the suggested method due to fan wear.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.