Overclocking doesn't work on Maxwell GPUs

BlueGoliath · December 16, 2025, 1:28pm

Apparently the roller coaster never ends.

On Blackwell, performance limiters are just broken. Even if a GPU is being throttled due to hitting power limits, nothing is reported by any NVML based application:

(normally it’d be running at around 2950mhz)

And yes, that includes nvidia-smi.

Hilariously, if the GPU is “idle”, then the performance limiters report “power”. Blackwell has been a thing for the desktop for almost a year now and no one at Nvidia has noticed and fixed it. Incredible, really.

But hey, maybe Nvidia has a new API for getting this information and no one(including Nvidia themselves), has updated to it yet. Lets check the NVML header!

github.com/NVIDIA/nvidia-settings

src/nvml.h

9da030bed


      
          *         - \ref NVML_ERROR_UNKNOWN           on any unexpected error
          *
          * @see nvmlClocksEventReasons
          * @see nvmlDeviceGetCurrentClocksEventReasons
          */
          nvmlReturn_t DECLDIR nvmlDeviceGetSupportedClocksEventReasons(nvmlDevice_t device, unsigned long long *supportedClocksEventReasons);
          
          /**
          * @deprecated Use \ref nvmlDeviceGetSupportedClocksEventReasons instead
          */
          DEPRECATED(13.0) nvmlReturn_t DECLDIR nvmlDeviceGetSupportedClocksThrottleReasons(nvmlDevice_t device, unsigned long long *supportedClocksThrottleReasons);
          
          /**
          * @deprecated Use \ref nvmlDeviceGetPerformanceState. This function exposes an incorrect generalization.
          *
          * Retrieve the current performance state for the device.
          *
          * For Fermi &tm; or newer fully supported devices.
          *
          * See \ref nvmlPstates_t for details on allowed performance states.
          *

Oh, the old function that almost every single application is using is deprecated. Surely Nvidia replaced it with something meaningful to warrant such a deprecation, right?

nvmlReturn_t DECLDIR nvmlDeviceGetSupportedClocksEventReasons(nvmlDevice_t device, unsigned long long *supportedClocksEventReasons);

It’s literally a name change. Nothing is different between the two. This could have been an implementation modification. What is even the point of this?

And it turns out that there is zero functional difference between the two. It’s still entirely broken in the same way the throttle variant was.

What is even the point of deprecating the old version? Is Nvidia letting an intern using ChatGPT make design decisions without a sign off from someone higher up?

You won’t give third party developers access to your better low level APIs but you’ll waste everyone’s time, including your own, creating nonsensical poorly documented public APIs that do not work and make it impossible for third party developers to support everything. Like what even is this:

github.com/NVIDIA/nvidia-settings

src/nvml.h

9da030bed


      
          *
          * NVML_PERF_POLICY_POWER             -> NVML_FI_DEV_CLOCKS_EVENT_REASON_SW_POWER_CAP
          * NVML_PERF_POLICY_THERMAL           -> NVML_FI_DEV_CLOCKS_EVENT_REASON_SW_THERM_SLOWDOWN
          * NVML_PERF_POLICY_SYNC_BOOST        -> NVML_FI_DEV_CLOCKS_EVENT_REASON_SYNC_BOOST
          * NVML_PERF_POLICY_BOARD_LIMIT       -> NVML_FI_DEV_PERF_POLICY_BOARD_LIMIT
          * NVML_PERF_POLICY_LOW_UTILIZATION   -> NVML_FI_DEV_PERF_POLICY_LOW_UTILIZATION
          * NVML_PERF_POLICY_RELIABILITY       -> NVML_FI_DEV_PERF_POLICY_RELIABILITY
          * NVML_PERF_POLICY_TOTAL_APP_CLOCKS  -> DEPRECATED, Do not use
          * NVML_PERF_POLICY_TOTAL_BASE_CLOCKS -> NVML_FI_DEV_PERF_POLICY_TOTAL_BASE_CLOCKS
          */
          DEPRECATED(13.0) nvmlReturn_t DECLDIR nvmlDeviceGetViolationStatus(nvmlDevice_t device, nvmlPerfPolicyType_t perfPolicyType, nvmlViolationTime_t *violTime);
          
          /**
          * Gets the device's interrupt number
          *
          * @param device                               The identifier of the target device
          * @param irqNum                               The interrupt number associated with the specified device
          *
          * @return
          *         - \ref NVML_SUCCESS                 if irq number is successfully retrieved
          *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized

You deprecated this in CUDA 13 and have it marked for removal in 14. Why?

And then you have other nonsense like this:

github.com/NVIDIA/nvidia-settings

src/nvml.h

9da030bed


      
          * Retrieves the temperature threshold for the GPU with the specified threshold type in degrees C.
          *
          * For Kepler &tm; or newer fully supported devices.
          *
          * See \ref nvmlTemperatureThresholds_t for details on available temperature thresholds.
          *
          * Note: This API is no longer the preferred interface for retrieving the following temperature thresholds
          * on Ada and later architectures: NVML_TEMPERATURE_THRESHOLD_SHUTDOWN, NVML_TEMPERATURE_THRESHOLD_SLOWDOWN,
          * NVML_TEMPERATURE_THRESHOLD_MEM_MAX and NVML_TEMPERATURE_THRESHOLD_GPU_MAX.
          *
          * Support for reading these temperature thresholds for Ada and later architectures would be removed from this
          * API in future releases. Please use \ref nvmlDeviceGetFieldValues with NVML_FI_DEV_TEMPERATURE_* fields to retrieve
          * temperature thresholds on these architectures.
          *
          * @param device                               The identifier of the target device
          * @param thresholdType                        The type of threshold value queried
          * @param temp                                 Reference in which to return the temperature reading
          * @return
          *         - \ref NVML_SUCCESS                 if \a temp has been set
          *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized
          *         - \ref NVML_ERROR_INVALID_ARGUMENT  if \a device is invalid, \a thresholdType is invalid or \a temp is NULL

What is the point of any of this? You have enums for GPU archs. Is whatever terrible AI you’re using not trained on code that uses switch-case statements?

Topic		Replies	Views
On Windows, getting list of graphics or compute processes can return invalid argument depending on launch System Management and Monitoring (NVML)	32	2684	July 6, 2024
nvmlDeviceSetDefaultFanSpeed_v2 does not resume fan speed algorithm! Please fix! Linux	1	1051	May 16, 2022
APIs of NVML to get all the information of GPU GPU-Accelerated Libraries	0	507	February 12, 2020
BUG 378/381: nvml/nvidia-smi core clock is wrong on pascal devices Linux	22	14242	October 14, 2021
GTX 1070 nvidia-smi P states and Application Clocks support System Management and Monitoring (NVML)	1	2897	December 20, 2017
One weird trick to get a Maxwell v2 GPU to reach its max memory clock ! CUDA Programming and Performance	59	18642	April 22, 2016
How to call NVML APIs? CUDA Programming and Performance	5	17533	October 18, 2011
Nvidia-smi: gpm-metrics not populated System Management and Monitoring (NVML)	7	2571	June 26, 2023
nvidia-smi not fully supported on GTX 1060 Linux	41	39760	January 17, 2018
NVML - issues System Management and Monitoring (NVML) nvml	0	889	May 30, 2023

Overclocking doesn't work on Maxwell GPUs

Related topics