thermal throttling

Hi

We’re running some performance tests with system with custom designed carrier board and we’re trying to characterize how the system performs at various temperatures and where is it’s upper limit.

What would be the recommended way to find to find out if system is in thermal throttling and what level of throttling is applied.

We run long test runs (> hour) so i would appreciate something that would keep logging for me the throttling status over the time so I can correlate it with results of other sensors.

Any tips appreciated.

Many thanks
Daniel

Please run the tegrastats binary with adjusted interval during your test. It will show the temperature of soc.

Hi

Thanks for tip - I was already trying tegrastats, but I don’t see how does this tell you if the chip applied any throttling or not.

ie. I got this for case when there was for sure throttling happening (I got lower current on supply line to ECU than in other cases) :

xavier@fast-prototype:~$ tegrastats
RAM 7413/15683MB (lfb 1527x4MB) CPU [100%@1113,100%@1113,100%@1113,100%@1113,84%@1113,90%@1113,88%@1113,81%@1113] EMC_FREQ 0% GR3D_FREQ 99% AO@85.5C GPU@89C Tboard@82C Tdiode@85.75C AUX@83.5C CPU@89.5C thermal@86.95C PMIC@100C GPU 7078/7078 CPU 2996/2996 SOC 10164/10164 CV 0/0 VDDRQ 2085/2085 SYS5V 3686/3686

but I see no significant difference to the case when there was no throttling:

xavier@fast-prototype:~$ tegrastats
RAM 7413/15683MB (lfb 1511x4MB) CPU [100%@1190,100%@1190,100%@1190,100%@1190,88%@1190,80%@1190,79%@1190,88%@1190] EMC_FREQ 0% GR3D_FREQ 99% AO@38.5C GPU@42.5C Tboard@39C Tdiode@41.5C AUX@40C CPU@43.5C thermal@41.8C PMIC@100C GPU 8162/8162 CPU 2843/2843 SOC 9073/9073 CV 0/0 VDDRQ 2106/2106 SYS5V 3524/3524

Just reading temperatures to my opinion does not tell much. As I’m running tests in climate chamber at different temperatures, I actually want to find highest temperature when the system can work 100% without throttling.
… Or am I missing something?
I need something that can tell me that throttling has occurred and if possible also what level of throttling.

I’ve noticed that sometimes you get some kernel messages like:

[  118.400691] FAN rising trip_level:3 cur_temp:72400 trip_temps[4]:81000
[  339.040743] FAN rising trip_level:4 cur_temp:81100 trip_temps[5]:140000

that look like some kind of throttling information, unfortunately I haven’t found anywhere any explanation what these messages means.

Many thanks
Daniel

  1. I think you didn’t run tegrastats with sudo and thus some info did not show.

  2. tegrastats can only tell you whether your device goes through a thermal throttling. For example, if you see the cpu/gpu/emc freq is dropping down when your soc thermal is higher, it means throttling happens.

  3. If you want to know more about throttling, please refer to our L4T document:

https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide%2Fpower_management_jetson_xavier.html%23

You could try to change some setting in this device tree for each thermal zone.

nvidia/platform/t19x/common/kernel-dts/t19x-common-platforms/tegra194-thermal.dtsi