Advanced API Performance: SetStablePowerState

Originally published at: https://developer.nvidia.com/blog/advanced-api-performance-setstablepowerstate/

This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

Thanks for the post!
After many tries, this use of nvidia-smi --lock functions gave us the persistent performance the project requires

It is very useful APIs when using the GPU as a DSP that need to work without any frequency lowering / idle behavior

Related to this post,
can you explain the different usage of the following Performance approaches:

  1. Use “Nvidia Control Panel” GUI settings

  2. Use nvidia-smi command line calls

  3. Use nvml CUDA API

I prefers to use NVML to get the GPU persistent behavior, is it possible to get such persistent behavior only using nvml on GeForce RTX 3050 ?

Hi!

That’s great news.

Did you encounter differences between the article’s recommendations and your usage? I’d like to update it to match current uses. nvidia-smi isn’t guaranteed to be a fixed target, unfortunately and it’s been a while since I wrote this article!

Thanks, Ryan

Hello again-

I’m not familiar with what you’re referring to with 1/control panel, can you elaborate?

I’m also not certain about the differences between 2 and 3. I suspect on the backend they are poking the same driver components to achieve their goals, because there’s basically one place to set these things. Unfortunately, the best way in the near term to establish this is probably to do a bit of testing.

I’ll try to find the maintainers of these routes and report back if/when I get definitive answers.

Thanks, Ryan

Hello,
This sequence from the post works perfectly to set GPU core & GPU memory clock frequencies

nvidia-smi --query-supported-clocks=timestamp,gpu_name,gpu_uuid,memory,graphics --format=csv`
nvidia-smi --lock-gpu-clocks=<core_clock_rate_from_csv>
nvidia-smi --lock-memory-clocks=<memory_clock_rate_from_csv>

There is additional GPU performance option on:
Windows → “Nvidia Control Panel” → “3D setting” → “Manage 3D setting” → “Global Setting” / “Program Setting” → “Power management mode” = “Prefer maximum performance”

Ah, I would not rely on that setting. That’s more of a behavioral suggestion, it doesn’t pin the frequencies.

Excellent, so it looks like the commands are the same. Thanks!

I tried overclocking the gpu by locking the core clock (-lgc) above the default, but it isnt working. It only locks the core below the stock setting.
Anyone having similar issues or knows a solution?

Hi daky,

The utility is configured to work within design parameters as a safety precaution. There are other tools that will let you set clocks higher, but I wouldn’t recommend it. Changing settings outside of intended values can cause hardware failures, visual glitching, and other incorrect behavior.

Thanks, Ryan

Hey, I did a bit more testing and I cant apply any overclocks, even within the predetermined buckets when the GPU is under load.

For example I can run nvidia-smi -lgc 2100 while idling and the command applies immediately. But if I put the GPU under load first (gaming, benchmarking) the core will raise to 2000 and if I run nvidia-smi -lgc 2100 then, nothing happens. The core will remain at 2000.
However lowering the value still works, so if run nvidia-smi -lgc 1800 the core will drop to 1800 immediately.

I have also tried raising the core delta via Afterburner and that applied without any issues and the GPU happily runs at 2100 so I doubt it is a safety issue since I have seen the GPU run at those numbers for a long time and all of the values are within the default bucket settings (2160).

Hi daky,

Sorry for the very late reply!

nvidia-smi is not a general overclocking utility with limited support on GeForce hardware. If you require changing clocks while applications are running, please use whatever means work for your needs. As I said previously, overclocking can lead to temporary misbehavior or permanent damage to hardware.

If you’re seeing the aforementioned behavior on RTX/Quadro or datacenter parts, please let us know.

Thanks,
Ryan