GPU power consumption doesn't follow GPU usage

I’m writing a paper that covers the GPU usage and power consumption while running yolo_v8 object detection and i noticed a weird pattern. For my experiment the object detection starts around frame 50 and continuous until frame 100, in this point the GPU usage drops as expected, but the power consumption doesn’t drop until frame 130. And that only happens when I’m using the nano version of the yolo model, for the xlarge model that doesn’t happen. Is the anything about nvidia GPU’s that could explain this behavior?

Driver version: 565.57.01

CUDA version: 12.7

GPU: Nvidia RTX A5000

What you’re possibly seeing, is a threshold effect in the algorithm Nvidia use to govern the power saving mode of the GPU. There are no published details around this, but at idle, the GPU and memory clocks are considerably reduced and ramp up when presented a load. Once it enters an idle state again, the clocks are kept up for a period of time, before being ramped back down.

Although both loads are quite light, GPU usage-wise, I wonder if below a certain point, clocks are maintained for a longer period for some reason. Out of interest, what is the time duration of the x-axis?

It might be interesting running the test on a smaller, (SM-wise), card, pushing the usage % up.

I agree: this looks like a hysteresis effect in the GPU’s power management and is as expected. The details of the GPU power management are not publicly documented, and from observing the behavior over the years, the details appear to keep changing with GPU architecture and driver version.

In particular, when a GPU becomes idle, it doesn’t immediately fall into a low-power mode, in case more work is sent to the GPU within a short time. This is a performance feature, as ramping up to full-power mode is associated with a delay which negatively impacts performance.

Simply idling a GPU while in full-power state causes considerable power draw due to the complexity of the device, i.e. a large number of transistors. Depending on GPU model I have seen between 30W and 50W. You seem to be using a high-end GPU, so the 60W seen in the graph do not seem to be out of the ordinary.

Once the GPU power management drops (usually in stages) a GPU down to the power-saving mode with the lowest power consumption, its operating frequency drops drastically (e.g. 300 MHz), the voltage is reduced considerably (usually to around 0.7 volts), the PCIe interfaces are downgraded, etc. In other words, all possible measures are taken to reduce power consumption, and that often drops power draw into the single digit watt range from what I have seen. For a high-end GPU it might be a bit higher, and the graph seems to show a value around 17 watts, which seems very plausible.

1 Like

Thanks @njuffa and @rs277 for the explanations. I followed your tip of analyzing the data using time as the x-axis, and turns out that both model have the same behavior in term of power usage, both continue to use power for about 18 seconds after the GPU becomes idle, but in my original graph that time span is unnoticeable for the blue line. I’ve attached a new graph that illustrates this.

The 15% GPU utilization shown in the graph seems odd for a GPU-accelerated application. With utilization that low the GPU might actually be operating not in the highest performance state, but in the next state down. I don’t know the power states and associated frequency ranges for this particular GPU, but from my observations, typically the highest-performance state would have the GPU operating at 1600-1800 MHz, while the next state down would have the GPU operating at 1000-1200 MHz.

Low GPU utilization in an GPU accelerated app would indicate poor software optimization to me. If this is your code running, I would suggest some serious profiling to see what is going on. If this is 3rd party software, I would suggest discussing with the vendor. In either scenario it is possible that the GPU is not being fed fast enough by the host platform, so you might want to investigate that. My long-standing recommendation is to use CPUs with at least 3.5 GHz base frequency to avoid bottlenecking GPU-accelerated code by single-thread CPU performance in the host portion of the code.