Is my RTX 3090 not receiving enough power?

Hi,

After trying for a while to figure out why my GPU was hanging and making unkillable processes, I figured out that it is likely not a software issue at all but an issue of insufficient power. Most CUDA Samples run and complete successfully, for example, but a handful cause the same issue. Whatever process was running hangs infinitely and becomes unkillable, and most fields in a looping nvidia-smi call switch to “GPU is lost”. The CPU still functions fine though. To detect and try to use the GPU again, a hard reset of the machine is needed.

After some snooping, I have figured out that the issue arises whenever the GPU’s power draw exceeds ~130W. When setting the power limit to 100W with nvidia-smi -pl 100, the same CUDA Samples and other programs that hung before now run successfully to completion. Slowing down the clock speed also has this effect, as long as it’s slow enough that the power draw does not exceed that same ~130W mark.

I learned that NVIDIA recommends a PSU of at least 750W for an RTX 3090, as its default power draw limit is 350W (though in practice it seems it can go a bit higher). The PSU I have is only 600W. Before I buy a new, more powerful PSU, does this behavior/diagnosis make sense? Isn’t 130W a lower threshold than you’d expect, even with the slightly underpowered PSU? Why doesn’t it seem to affect the CPU? I didn’t build this PC myself either, so I want to make sure I’m doing the right thing before I change it.

Thanks in advance for any help.

The 3090 uses many power rails, all having their own fuse. So while the overall power usage doesn’t seem high, one rail might get into an over-current situation so the fuse triggers. The cpu/mainboard uses its own power rail so is unaffected.
Did you already try to reseat the power connectors?

I disconnected and reconnected the power connectors, is that the same as reseating? The GPU would not do anything without both (6+2)-pin connectors plugged in, so I took that to confirm that at least some level of power is being received through them. Since making this post, I have also confirmed that the same 3090 works fine in a different desktop and can draw at least 300+W there (that PC also has a 600W supply but perhaps has fewer peripherals running on that same supply). I have a 1070 in the 3090’s original spot now, and it works but I haven’t gotten it to draw above 130W yet, I’m curious if it will fail in the same way as the 3090 if it does.

Yes.
Seems to me the PSU (or one of it s rails) is simply broken.

That would likely indicate your PSU is indeed too small.

Yes. I’ve seen the same on a dual Titan setup - switching up from a 750W to 1.5KW PSU resolved the issues.

Different rails.