Performance issue on RTX 4090 with Ultralytics YOLOv8 and PyTorch

tm.mbl · July 12, 2024, 4:45pm

I am running a Deep Learning pipeline using Ultralytics YOLOv8 and PyTorch for object detection on a newly bought workstation with a GeForce RTX 4090.
When I start the script, the performance state of the GPU goes to P2 and the pipeline runs at around 7fps, which is good and around the expected behaviour. However it quickly drops down to state P8 and around 4fps. It then keeps regularly cycling back to P2 for a short time, and again to P8.
I also have a laptop (on Ubuntu) that can run the same code up to 20% faster, despite having worse specs (CPU and GPU).

The power mode and other settings I could find have all been set to high performance.
I have tried different versions for the software and libraries, with no effect.
The temperature or the power supply don’t seem to be the limiting factors here, as they can go much higher.
I couldn’t find any solution on the web so far.

Has anyone else experienced similar issues or have any suggestion on how to maintain a higher performance?

The workstation uses Windows 11 Home, with an Intel Core i9-14900KF. The GPU is a GeForce RTX 4090 with driver 555.99 (latest available). I use PyTorch version 2.3.1 with CUDA 12.1.

Robert_Crovella · July 12, 2024, 5:46pm

I agree.

Make sure you have set things in the windows control panel for high performance. I don’t have a recipe for you, but you can find various web postings that discuss this.

That may be a factor. In general, windows usage of a high-end GeForce RTX GPU is highly oriented towards gaming performance, power management, and quietness. You might have a better (throughput) experience for a pytorch workload by running that workstation on Ubuntu. No, I don’t mean WSL2, either. That is still a windows setting. If you do switch to Linux, then an even better way to go is to relieve the RTX4090 of any display processing chores (use another GPU/display adapter). In windows, your GPU must constantly switch back and forth between servicing CUDA based workloads, and servicing display workloads. For a WDDM GPU, there is no getting around this entirely, even if you drive the display from another GPU. Switching to Linux gives you a path to avoid this inefficiency. Once you’ve done that, you can explore linux-based methods for maximizing performance, but not all of the options/capabilities are available on a GeForce GPU.

njuffa · July 13, 2024, 12:51am

The low GPU power consumption shown above (~ 20W) strongly suggests that this GPU is largely idling. The fact that a GPU power saving mode such as P8 is entered suggests that any bursts of intense GPU usage that may occur are very short in duration.

Overall light GPU usage can prevent a GPU from ever running with the highest performance settings, as power state switching and GPU clock boosting are not instantaneous but have a certain amount of hysteresis. If you are so inclined, you could explore this behavior in detail (as I did in the past) with a computationally intense kernel of configurable duration that is activated at configurable intervals.

On Windows, you can get good visualization of GPU activity with the free tool GPU-Z from TechPowerUp.

So my tentative diagnosis is that whatever apps you are running are not actually making much use of the GPU, and you would want to explore their configuration settings to see whether anything can be done about it. As a sanity check, I would suggest running a CUDA-accelerated app that is known to fully utilize GPUs such as Folding@home.

Topic		Replies	Views
Performance state switches from P0 to P2 when starting program CUDA Programming and Performance cuda , python , linux	16	13613	October 3, 2024
Why RTX4090 performs at a level much lower than officially claimed? Linux	2	378	September 29, 2024
Performance Issue on DeepStream with RTX 4090 – Low GPU Utilization DeepStream SDK cuda , gstreamer , performance , deepstream	4	226	November 1, 2024
Bad performance with at high reported wattage usage on RTX A2000 in laptop with Ubuntu 22.04 Drivers - Linux, Windows, MacOS ubuntu , power , pytorch , linux , notebooks , driver	3	1372	March 4, 2024
I think the 4090 is not performing properly CUDA Programming and Performance cuda , tensorflow , python	1	1194	March 11, 2023
Performance Slowdown during Distributed Training with 4x RTX 4090 GPUs cuDNN cuda , pytorch , ai-training , gpu	6	4777	September 29, 2023
Best Practices for Using RTX 4090 with Ubuntu 22.04, CUDA, and YOLO (Ultralytics) Linux	0	1253	March 21, 2025
Consistent performance with RTX 2080 CUDA Programming and Performance	4	1318	November 3, 2020
[CRASH!] System crash/reboot on RTX 4090 GPU - Hardware boot , cuda	1	1255	May 8, 2023
Low performance when running pipeline with RTX 4090 DeepStream SDK	24	858	March 21, 2024

Performance issue on RTX 4090 with Ultralytics YOLOv8 and PyTorch

Related topics