GPU utilization suddenly doubles after 3 and a half days

hussam · June 16, 2018, 5:32am

I have this odd issue. With three game windows and many other things, the GPU utilization is stable at 18 to 20% for around three days. Then it suddenly goes up to (36 to 38%) and the change is not gradual.
Restarting Xorg and then the applications does not lower it down to 18% again. I have to rmmod nvidia modules and reload them to fix this.
In other words, that means only reloading the kernel modules brings GPU utilization while in Xorg down.
390.67 but this was happening for at least all 390.xx releases.

Section "Device"
    Identifier "Device0"
    Driver     "nvidia"
    VendorName "NVIDIA Corporation"
    BoardName  "GeForce GTX 1050 Ti"
    Option     "ConnectToAcpid" "0"
    Option     "ConnectedMonitor" "DFP-1"
    # Setting UseNvKmsCompositionPipeline to false Didn't help
EndSection

Section "Monitor"
    Identifier "Monitor0"
    Option     "Enable" "true"
EndSection

Section "Screen"
    Identifier  "Screen0"
    Device      "Device0"
    Monitor     "Monitor0"
    Option      "metamodes" "nvidia-auto-select +0+0 {ForceFullCompositionPipeline=On}"
    Option      "TripleBuffer" "on"
EndSection

#Section "ServerFlags"
#    Option "DontVTSwitch" "True"
#EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
EndSection

nvidia-bug-report.log.gz (245 KB)

generix · June 17, 2018, 12:20am

Anything else noticeable in that case, i.e. vmem full, excessive sysmem usage?

hussam · June 17, 2018, 6:05am

video memory never crossed 800MB out of 4GB and system memory is normally at 4 to 5GB out of 15.6GB.

Can you give me a list of detailed information I can gather when this happens again? Thank you.
I mean something like “paste the output of the following commands” and such.

generix · June 18, 2018, 9:59am

Nothing sophisticated, just the output of slabinfo -o for the kernel memory and nvidia-smi for the vmem.

hussam · June 23, 2018, 8:30am

@generix, it just happened again https://pastebin.com/raw/yq3KfDh7

nvidia-smi 
Sat Jun 23 11:28:21 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  On   | 00000000:01:00.0  On |                  N/A |
| 46%   44C    P0    N/A /  75W |    992MiB /  4032MiB |     34%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     22538      G   /usr/lib/xorg-server/Xorg                    404MiB |
|    0     22775      G   /usr/bin/gnome-shell                         155MiB |
|    0     24270      G   ./el                                         120MiB |
|    0     24271      G   ./el                                         125MiB |
|    0     24272      G   ./el                                         163MiB |
+-----------------------------------------------------------------------------+

It was at 16% last night when I locked the computer. I unlocked it today to see it jumped to 34%.

generix · June 23, 2018, 1:06pm

Nothing unusual though the Xserver vmem usage rises over time but still inside normal range. So no explaination why a module unload is necessary to return to normal operations.

hussam · June 26, 2018, 7:32pm

I will compile and install linux kernel 4.14.52 and start counting again in the morning. I’ve barely updated anything on my computer this year so it’s simply limited to the kernel or nvidia driver.

hussam · July 7, 2018, 2:39pm

The kernel update did not help. The same issue happened at around 3 days and 15 hours of uptime. Anyone has any ideas?

hussam · September 14, 2018, 4:24am

@generix, it appears I was hitting the resource leak that 390.87 fixed. All is fine now.

Topic		Replies	Views
Reproduceable Bug in reported GPU-Utilization Linux	3	1639	July 19, 2016
nvidia-smi Volatile GPU-Util 100%, always, reboot operating system can not fix CUDA Setup and Installation	6	11411	November 30, 2020
High GPU utilization at idle Linux	3	2812	December 3, 2018
GeForce RTX 2080 ERR! show in nvidia-smi Linux	33	5025	April 12, 2019
GPU Utilization spikes to 100% when process exits Nsight Visual Studio Edition	4	425	June 28, 2024
390.25 nvidia-modeset Freed GPU/Allocated GPU Linux	11	3742	October 14, 2021
GPU has fallen off the bus twice in a month Linux	0	467	January 6, 2025
GPU load 20-30% after driver update Linux	0	396	April 29, 2023
Driver stuck loading on 100% CPU Linux	1	993	January 15, 2020
Machine reboots after a while of constant GPU usage Linux	1	400	August 14, 2022

GPU utilization suddenly doubles after 3 and a half days

Related topics