I am running a Deep Learning pipeline using Ultralytics YOLOv8 and PyTorch for object detection on a newly bought workstation with a GeForce RTX 4090.
When I start the script, the performance state of the GPU goes to P2 and the pipeline runs at around 7fps, which is good and around the expected behaviour. However it quickly drops down to state P8 and around 4fps. It then keeps regularly cycling back to P2 for a short time, and again to P8.
I also have a laptop (on Ubuntu) that can run the same code up to 20% faster, despite having worse specs (CPU and GPU).
The power mode and other settings I could find have all been set to high performance.
I have tried different versions for the software and libraries, with no effect.
The temperature or the power supply don’t seem to be the limiting factors here, as they can go much higher.
I couldn’t find any solution on the web so far.
Has anyone else experienced similar issues or have any suggestion on how to maintain a higher performance?
The workstation uses Windows 11 Home, with an Intel Core i9-14900KF. The GPU is a GeForce RTX 4090 with driver 555.99 (latest available). I use PyTorch version 2.3.1 with CUDA 12.1.
Make sure you have set things in the windows control panel for high performance. I don’t have a recipe for you, but you can find various web postings that discuss this.
That may be a factor. In general, windows usage of a high-end GeForce RTX GPU is highly oriented towards gaming performance, power management, and quietness. You might have a better (throughput) experience for a pytorch workload by running that workstation on Ubuntu. No, I don’t mean WSL2, either. That is still a windows setting. If you do switch to Linux, then an even better way to go is to relieve the RTX4090 of any display processing chores (use another GPU/display adapter). In windows, your GPU must constantly switch back and forth between servicing CUDA based workloads, and servicing display workloads. For a WDDM GPU, there is no getting around this entirely, even if you drive the display from another GPU. Switching to Linux gives you a path to avoid this inefficiency. Once you’ve done that, you can explore linux-based methods for maximizing performance, but not all of the options/capabilities are available on a GeForce GPU.
The low GPU power consumption shown above (~ 20W) strongly suggests that this GPU is largely idling. The fact that a GPU power saving mode such as P8 is entered suggests that any bursts of intense GPU usage that may occur are very short in duration.
Overall light GPU usage can prevent a GPU from ever running with the highest performance settings, as power state switching and GPU clock boosting are not instantaneous but have a certain amount of hysteresis. If you are so inclined, you could explore this behavior in detail (as I did in the past) with a computationally intense kernel of configurable duration that is activated at configurable intervals.
On Windows, you can get good visualization of GPU activity with the free tool GPU-Z from TechPowerUp.
So my tentative diagnosis is that whatever apps you are running are not actually making much use of the GPU, and you would want to explore their configuration settings to see whether anything can be done about it. As a sanity check, I would suggest running a CUDA-accelerated app that is known to fully utilize GPUs such as Folding@home.