I am conducting tests on an AGX Orin with 32GB RAM. Before running LLaMA inference, I modified the settings in /sys/class/devfreq/17000000.gpu by setting the governor to userspace and configuring both max_freq and min_freq to 306000000. However, after launching the inference script, I observed that the actual GPU frequency was 930750000. How can I ensure that the GPU frequency remains fixed?
*** Please note that this reply is generated by LLM automatically ***
To ensure that the GPU frequency remains fixed, you need to consider the power management and thermal throttling mechanisms on the Jetson AGX Orin module.
Firstly, the Jetson AGX Orin module has a Thermal Design Power (TDP) budget, which is set to 40W for the AGX Orin 32GB module. The module has an over current (OC) limit set for the summation channel to prevent the total module power from exceeding the TDP budget. If the instantaneous current of the summation channel exceeds the configured OC limit, the module will perform hardware-based clock throttling on the GPU and CPU for a configurable period of time.
In your case, you have set the MAXN mode to SUPER and configured both GPU and CPU to their maximum frequencies. However, if the total module power exceeds the TDP budget, the module will throttle the GPU frequency to a lower value to stay within the thermal budget.
To prevent this thermal throttling, you can try the following:
- Build a custom power mode: As recommended in the documentation, you can build a custom power mode to find the right balance between power consumption and performance for your application. This will allow you to set a power budget that is within the TDP limit, preventing thermal throttling.
- Monitor the module’s power consumption: Use tools like
tegrastatsornvpmodelto monitor the module’s power consumption and adjust your power mode accordingly. - Adjust the GPU frequency: Instead of setting the GPU frequency to its maximum value, try setting it to a lower value that is within the power budget. This will prevent thermal throttling and ensure that the GPU frequency remains fixed.
- Use a different cooling solution: If you are using a passively cooled GPU, consider using an actively cooled GPU or a different cooling solution to improve heat dissipation.
In summary, to ensure that the GPU frequency remains fixed, you need to consider the power management and thermal throttling mechanisms on the Jetson AGX Orin module. Building a custom power mode, monitoring power consumption, adjusting the GPU frequency, and using a different cooling solution are some strategies you can use to prevent thermal throttling and maintain a fixed GPU frequency.
Here is a comparative analysis of the different strategies:
| Strategy | Description | Advantages | Disadvantages |
|---|---|---|---|
| Build a custom power mode | Set a power budget within the TDP limit | Prevents thermal throttling, maintains fixed GPU frequency | Requires knowledge of power management, may require trial and error |
| Monitor power consumption | Use tools to monitor power consumption and adjust power mode | Helps identify power consumption patterns, allows for adjustments | May require frequent monitoring, may not prevent thermal throttling |
| Adjust GPU frequency | Set GPU frequency to a lower value within power budget | Prevents thermal throttling, maintains fixed GPU frequency | May result in lower performance |
| Use a different cooling solution | Improve heat dissipation using a different cooling solution | Prevents thermal throttling, maintains fixed GPU frequency | May require additional hardware, may not be feasible in all cases |
By considering these strategies and comparing their advantages and disadvantages, you can choose the best approach to ensure that the GPU frequency remains fixed on your Jetson AGX Orin module.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
First of all, thank you very much for your reply. However, the issue does not seem to be resolved yet.
To clarify, I did not attempt to set the GPU frequency to its maximum. On the contrary, I intended to lock it to a lower frequency, but my configuration appears to have had no effect.
I used the exact same method to adjust the CPU frequency by modifying the settings in /sys/devices/system/cpu/cpufreq/policy0, and the CPU frequency was successfully locked as expected.
Could there be an issue with my configuration? Why is it not taking effect for the GPU? Is there a reliable way to actually lock the GPU frequency on this platform?
[INFO] /sys/class/devfreq/17000000.gpu/max_freq: 306000000 Hz (306 MHz)
[INFO] /sys/class/devfreq/17000000.gpu/min_freq: 306000000 Hz (306 MHz)
[INFO] /sys/class/devfreq/17000000.gpu/cur_freq: 930750000 Hz (930 MHz)
My test procedure is as follows:
I first set the GPU to a fixed frequency point, then start the llama inference script, only to find that the GPU frequency is not fixed during inference. You can refer to my previous message for details.
I then modified the llama inference script to fix the GPU frequency (using the same method) during inference, and this time the GPU frequency stayed locked successfully.
I am now confused about how to reliably ensure the GPU remains fixed at a specified frequency point.
Please advise. Thank you very much.
Hi,
It’s recommended to adjust the clock via our nvpmodel tool instead.
For example, you can set AGX Orin to 15W with GPU clocks equal to 408MHz.
$ sudo nvpmodel -m 1
Thanks.
First of all, thank you so much for your reply. I was devastated to discover that the GPU failed to lock to a fixed frequency, rendering months of my work ineffective. Your response brings me great relief and happiness. Thank you deeply.
However, the solution you provided may not fully resolve my specific issue. Currently, I have set nvpmodel to its highest performance mode (MAXN) because I need the GPU to reach higher frequencies. Yet, within this high-frequency range, I still require fine-grained control over the GPU frequency, rather than relying solely on the nvpmodel setting. This is why I am attempting to use DVFSfor precise adjustments.
Could you please advise if I am using the wrong sysfs directories or if there are specific configurations I might have missed?
For memory, in addition to setting the frequency, I also configured mrq_rate_locked. Is there a similar operation or locking mechanism required for the GPU to achieve this level of fine-grained control?
root@ubuntu:/home/orin# jetson_clocks --show
SOC family:tegra234 Machine:NVIDIA Jetson AGX Orin Developer Kit
Online CPUs: 0-7
cpu0: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2188800 CurrentFreq=960000 IdleStates: WFI=1 c7=1
cpu1: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2188800 CurrentFreq=1113600 IdleStates: WFI=1 c7=1
cpu2: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2188800 CurrentFreq=729600 IdleStates: WFI=1 c7=1
cpu3: Online=1 Governor=schedutil MinFreq=115200 MaxFreq=2188800 CurrentFreq=1420800 IdleStates: WFI=1 c7=1
cpu4: Online=1 Governor=schedutil MinFreq=729600 MaxFreq=2188800 CurrentFreq=1344000 IdleStates: WFI=1 c7=1
cpu5: Online=1 Governor=schedutil MinFreq=729600 MaxFreq=2188800 CurrentFreq=1190400 IdleStates: WFI=1 c7=1
cpu6: Online=1 Governor=schedutil MinFreq=729600 MaxFreq=2188800 CurrentFreq=1190400 IdleStates: WFI=1 c7=1
cpu7: Online=1 Governor=schedutil MinFreq=729600 MaxFreq=2188800 CurrentFreq=1420800 IdleStates: WFI=1 c7=1
GPU MinFreq=306000000 MaxFreq=306000000 CurrentFreq=930750000
Active GPU TPCs: 7
EMC MinFreq=204000000 MaxFreq=3199000000 CurrentFreq=2133000000 FreqOverride=0
DLA0_CORE: Online=1 MinFreq=0 MaxFreq=1408000000 CurrentFreq=1408000000
DLA0_FALCON: Online=1 MinFreq=0 MaxFreq=742400000 CurrentFreq=742400000
DLA1_CORE: Online=1 MinFreq=0 MaxFreq=1408000000 CurrentFreq=1408000000
DLA1_FALCON: Online=1 MinFreq=0 MaxFreq=742400000 CurrentFreq=742400000
PVA0_VPS0: Online=1 MinFreq=0 MaxFreq=1190400000 CurrentFreq=1190400000
PVA0_AXI: Online=1 MinFreq=0 MaxFreq=857600000 CurrentFreq=857600000
FAN Dynamic Speed Control=nvfancontrol hwmon0_pwm1=1
NV Power Mode: MAX
GPU MinFreq=306000000 MaxFreq=306000000 CurrentFreq=930750000
Hi,
We have a customized tool for nvpmodel:
You can set a custom mode for your requirement.
Thanks.