Hi,
Do you run the inference with TensorRT?
If yes, you can put the model to DLA by using --useDLACore=[ID].
When the throttling message occurs, the system will reduce the clock automatically to protect the device.
If most of your tasks run on GPU, you can set up a manual clock mode and adjust the CPU clock to avoid the message.
Thanks.