On Blackwell, performance limiters are just broken. Even if a GPU is being throttled due to hitting power limits, nothing is reported by any NVML based application:
Hilariously, if the GPU is “idle”, then the performance limiters report “power”. Blackwell has been a thing for the desktop for almost a year now and no one at Nvidia has noticed and fixed it. Incredible, really.
But hey, maybe Nvidia has a new API for getting this information and no one(including Nvidia themselves), has updated to it yet. Lets check the NVML header!
Oh, the old function that almost every single application is using is deprecated. Surely Nvidia replaced it with something meaningful to warrant such a deprecation, right?
nvmlReturn_t DECLDIR nvmlDeviceGetSupportedClocksEventReasons(nvmlDevice_t device, unsigned long long *supportedClocksEventReasons);
It’s literally a name change. Nothing is different between the two. This could have been an implementation modification. What is even the point of this?
And it turns out that there is zero functional difference between the two. It’s still entirely broken in the same way the throttle variant was.
What is even the point of deprecating the old version? Is Nvidia letting an intern using ChatGPT make design decisions without a sign off from someone higher up?
You won’t give third party developers access to your better low level APIs but you’ll waste everyone’s time, including your own, creating nonsensical poorly documented public APIs that do not work and make it impossible for third party developers to support everything. Like what even is this:
You deprecated this in CUDA 13 and have it marked for removal in 14. Why?
And then you have other nonsense like this:
What is the point of any of this? You have enums for GPU archs. Is whatever terrible AI you’re using not trained on code that uses switch-case statements?