Seems it might not be exactly 75C and the throttling happens at varying degrees.
Just some more detail: I am running a GPU intensive program w/ CUDA which makes the GPU utilize 100% GPU.
My CPU clock frequency is always running at 1734000 which is the max.
When I initialize my worker program it seems the cpu frequency will start to lower down depending on how hot the temperature is.
If 75-85C it will lower down to 1326000
45-70C it will lower down to 1555500 or 1428000
Anything below that the cpu frequency will stary at 1734000
Hence why I want to disable all throttling if possible to see if this is the system trying to restrict performance.
Also this may be power related since in dmesg I often see “WARNING - Battery Over Current Limit hit, please refer to the Jetson Power management application note” while executing the program (though my hardware does not run on a battery).
OK, per your detail info, it might not be caused by temperature…what’s the value of your input voltage? 19V or? If not 19V, please try 19V to see if this throttling still happen.
I am seeing a similar issue. CPU is regulating speed at expected 89°C, however GPU is throttling back seemingly at temperature 79°C.
For CPU I am using onboard thermal zone1 (CPU-therm), and for GPU am using onboard thermal zone2 (GPU-therm).
Secondary issue is that the silicon seems to be seeing a 10°C temp drop across it, I would be expected these numbers to be similar temperatures.
I am using CPU and GPU stressing software which has all cores running at 100%. VDD input rail at 12V.
The maximum operating temperature are: T. cpu = 89°C, T. gpu = 90.5°C. What’s the value of zone1 when you saw zone2 reach 79°C? Throttling might be caused by T. cpu not T. gpu in your case.
EDIT:
Thermal zones are in different area of chip, which zones you observed reach 10C gap? How long does ti keep working when you see that? Is it in stable work status? The temperature gap is usually less than 3C, but sometimes it could reach peak 10C, it depends on system work status and heat dissipation.
When zone2 started throttling at 79°C, zone 1 was at ~89°C. In these conditions, CPU temp stabilised at 91°C with some throttle and GPU at 84. This was left for an hour.
In a second test I left it again for an hour, GPU throttled at zone2=86°C and zone1=89°C. This was a stable condition.
Does the GPU throttle back at the hottest silicon temperature then, regardless of GPU reading?
Yes, essentially. What still doesn’t make a huge amount of sense is that zone 1 only reached 89°C when the GPU throttled. Unless zone 2 is not actually reading GPU temp and there is a 1.5°C temperature drop across the silicon?? Zone 2 never rose above 85°C with high throttling.
Generally zone 1 reaching 89C is faster than zone 2 reaching 90.5C, but in case the GPU loading is much higher, the throttling could be caused by zone 2, it is also due to the thermal design.
It’s just a concern that zone 2 is controlling the GPU speed and that is throttling back well below 90.5°C. I have noticed from the TX2 thermal design guide now that the GPU temp is measured by AO-Therm (zone0). This would explain the temperature discrepancy. Can you confirm that this is the same for the TX1 and the diode naming convention is just a bit misleading?