How to detect thermal throttling event on Tegra CPU

Hello,

I’m a newbie to NVIDIA Jetson Xavier, and struggling to know how to detect CPU thermal throttling event.

Here is tegrastats output:

RAM 1591/31919MB (lfb 6682x4MB) SWAP 0/15959MB (cached 0MB) CPU [1%@1190,0%@1190,0%@1190,0%@1190,off,off,off,off] EMC_FREQ 0%@1600 GR3D_FREQ 0%@675 APE 150 MTS fg 0% bg 0% AO@30.5C GPU@31.5C Tdiode@34.75C PMIC@100C AUX@31C CPU@32C thermal@31.45C Tboard@32C GPU 622/622 CPU 311/311 SOC 1554/1554 CV 0/0 VDDRQ 777/777 SYS5V 2415/2415

Please tell me how to detect thermal throttling event on Tegra CPU if any.
Not supported in user space? Do I need to develop a kernel driver?

Also, tegrastats doesn’t report GPU memory utilization and throttling event, though NVML can detect on the other platform.

Please tell me how to detect them on Xavier GPU cores if any.

Thank you,
Ito

1 Like

Hi, please refer to thermal design guide for the details of throttling: https://developer.nvidia.com/embedded/dlc/jetson-agx-xavier-series-thermal-design-guide

Tegrastats only show memory utilization in total.

Thank you for the reply.

The document you mentioned describes temperature, voltage, current, and power consumption can be read directly from sysfs, and does not describe thermal throttling event.

I would like to know whether CPU & GPU actually throttle the speed or not (not via temperature), and throttling reasons if possible.
On the intel platform running with GTX, throttling reasons can be read by the NVML API nvmlDeviceGetCurrentClocksThrottleReasons().

I know the architecture of Tegra is different from that of GTX.
Can a flag or some kind of signal for throttling event/status be read directly from sysfs node?

I have found there is no separate GPU memory in Tegra, so GPU memory utilization is same as CPU memory utilization.

Hi,

Sorry, there is no event record for thermal throttling.

But we have it for Over Current based HW throttling. You can read it from /sys/kernel/debug/bpmp/debug/soctherm/oc/oc_ N /event_cnt , where N is SOCTHERM_OC pin number.

The thermal zones trip temperature sysfs nodes can be used to know thermal SW throttling threshold and SOC thermal debugfs nodes can be used to thermal HW throttling threshold. You need to manually monitor temperature to figure out if thermal limits crossed or not.

Thank you for the prompt response.
The event_cnt seems to be useful, however, it’s just a counter and not an event which indicates that slowdown occurs or goes back to normal. It’s a kind of history.
I understand I have no choice but to monitor temperature.
I’m grateful for your kind support.