We are working on a heavy load GPU application that involves inference on an Xavier NX with Jetpack 4.6. To obtain maximum performance we are using the 20W 6CORE predefined mode. When running our application we see several times the “System throttled due to overcurrent” message.
We’ve seen in other posts such as this and this that one workaround is to set the current maximum limit to 5A, however it has been reported that with high GPU loads it won’t work. That is our case, increasing the maximum limit is not enough to solve the problem. So, we would like to understand it better with the following questions:
Is this warning related to temperature or power consumption? If not then, could you explain what triggers it?
Is it harmful for the device to just run the application and ignore the warnings?
What are the consequences of the system being throttled? Are the clocks frequencies lowered? Will the performance drop?
Thanks for your answers @WayneWWW, that is really helpful. A couple of questions more:
You mentioned that it is related to power consumption, so is there a known limit of power consumption before the warning is triggered? Something like maybe we could monitor with the tegrastats tool?
Also, to elaborate more on how is the performance dropped:
Does the system switch power modes to a lower one? I’ve noticed that when the warning shows and we query the current model with nvpmodel -q it remains unchanged.
Are the clock frequencies lowered for the CPU or the GPU? And if so, do you know at what values?
The official solution is using the the power estimator to calculate your the power budget of the usecase. Then you can create your own nvpmodel for that usecase.
When the throttling happened, the performance (cpu/gup/emc freq) will be dropped for a moment. Thus, it will not be easy to observed by tegrastats unless the throltting keep happening.