Maximum power draw 3090

We are experiencing some pretty staggering power draws from 3090s during our inference workloads. This doesn’t really surprised me given there is GPU preprocessing, tensorRT inference and GPU post processing happening at the same time, running on multiple threads with unique contexts on each GPU. The GPU load is between 80-90 percent. We see significant spikes in the oscilloscope reading of current.

Is it possible to setup a workflow that will exceed the GPU recommendations? If so how do we manage those spikes? If the computer is not turning off, is this acceptable?

I’m not sure if this if this is the correct forum but any help is appreciated.

From what I see on our 3090 it is managed against the temperature, so when running for a longer time it gets slower. I think a high current can also damage the GPU but in general I think it is the temperature.
This assumption is based the the theory that the current will not exceed a limit so the GPU will imediatelly self-destruct ;-)
But you should not toy around with overclockig and so on…

Thanks for the info. Currently it is not overclocked, everything stock.

We ran a furmark comparison and there aren’t really too many spikes. Now we see the spikes throughout inference but I am wondering if that is different card usage that is causing that (tensor cores ect.).

I assume its impossible to write a program that gets the GPU to have peak power draws over 600 watts? This is what we are seeing (at least through the measurement of one 8 pin connector current). Maybe the pins do not draw power at the same time making this test somewhat unreliable.

Would you expect peak power draws to be different when running furmark versus inference workloads both closer to 100% GPU load? Is it possible that the only 8 pin connectors have different current profiles so the peak power is not way above the prescribed loads from the card?

What’s the time resolution of your scope measurements?

The nominal power rating of a CPU or GPU is based on the average power consumption across tens of seconds and it mostly important for managing thermal issues, i.e. sizing of cooling solutions, which is why it is sometimes called TDP (thermal design power). The same is true for the nominal power rating of PCIe auxilliary power cables which nominally supply up to 150W per 8-pin connector and up to 75W per 6-pin connector.

Rapid changes in workload intensity in conjunction with dynamic clocking employed by modern high-performance processors (CPUs as well as GPUs) can lead to significant power spikes on the order of microseconds to tens of microseconds. These power spikes can be more pronounced in compute apps than in graphics apps, as these different application classes exercise the functional units of the GPU differently, and are commonly observed with machine learning apps.

So if your oscilloscope can measure with, say, millisecond resolution then it would be normal to observe such power spikes, though I am a bit surprised that they would reach as high as 600W. While CPU and GPU power spikes are usually not in sync, quasi-simultaneous power spikes can occur and can contribute to a power supply being overwhelmed. When this happens, it most frequently manifests as random re-boots a few minutes into running a machine-learning application. This is caused by the power spike leading to a voltage drop (“brown-out”). In more severe cases, the power supply itself may shut down.

A properly sized PSU (power supply unit) is therefore important for HPC systems including those running AI tasks. My standing recommendation for rock-solid operation across a projected system life span of five years is to size the PSU such that the sum of the nominal power consumption of all system components does not significantly exceed 60% of the nominal power rating of the PSU. Assume 0.4W per GB of DDR4 system memory when summing the nominal power consumption of the system components.

In addition I recommend paying attention to the 80PLUS rating of the PSU, and use an 80PLUS Gold compliant PSU as the minimum for a high-performance workstation, with 80PLUS Platinum preferred. For a high-performance server, 80PLUS Platinum as the minimum, with 80PLUS Titanium preferred. PSUs with high 80PLUS ratings are more efficient, tend to run cooler (which helps extend the lifetime of electronic components), usually are designed with higher engineering margins and with better quality components, and often come with longer vendor warranties. The recommendation for a higher 80PLUS level for servers is based on difference in duty cycle compared to a workstation.


Per this Reddit thread, the 600W spikes you observed with the RTX 3090 are roughly in line with what others have observed:

I had a chat to Seasonic, they let me know that in their labs they have seen RTX 3090 transient loads spike to north of 550W before the power limits kick in and pull them back down.

There are some comments in that thread that NVIDIA and CPU manufacturers “need to get power spikes under control”. CPUs and GPUs already have active power management, but any such mechanism has finite response time. Best I know, current systems respond within 100 milliseconds, possibly less than that. While hardware vendors may be able to reduce the response time further (I do not have the expertise to guesstimate what a reasonable lower bound could be), it will never be zero and therefore power spikes will continue to exist.

[Even later:] This review of the RTX 3090 FE includes oscilloscope pictures in which some narrow 1-millisecond power spikes of up to 570W are visible. They conclude:

For this card, I would therefore calculate at least 460 to 500 watts as a proportion of the total secondary power consumption of the system

Their graph showing the highest power draw observed over various durations seems to suggest that the RTX 3090 power management is capable of reducing power draw to the 350W nominal limit within about 25 milliseconds.