GPU utilization suddenly doubles after 3 and a half days

I have this odd issue. With three game windows and many other things, the GPU utilization is stable at 18 to 20% for around three days. Then it suddenly goes up to (36 to 38%) and the change is not gradual.
Restarting Xorg and then the applications does not lower it down to 18% again. I have to rmmod nvidia modules and reload them to fix this.
In other words, that means only reloading the kernel modules brings GPU utilization while in Xorg down.
390.67 but this was happening for at least all 390.xx releases.

Section "Device"
    Identifier "Device0"
    Driver     "nvidia"
    VendorName "NVIDIA Corporation"
    BoardName  "GeForce GTX 1050 Ti"
    Option     "ConnectToAcpid" "0"
    Option     "ConnectedMonitor" "DFP-1"
    # Setting UseNvKmsCompositionPipeline to false Didn't help
EndSection

Section "Monitor"
    Identifier "Monitor0"
    Option     "Enable" "true"
EndSection

Section "Screen"
    Identifier  "Screen0"
    Device      "Device0"
    Monitor     "Monitor0"
    Option      "metamodes" "nvidia-auto-select +0+0 {ForceFullCompositionPipeline=On}"
    Option      "TripleBuffer" "on"
EndSection

#Section "ServerFlags"
#    Option "DontVTSwitch" "True"
#EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
EndSection

nvidia-bug-report.log.gz (245 KB)

Anything else noticeable in that case, i.e. vmem full, excessive sysmem usage?

video memory never crossed 800MB out of 4GB and system memory is normally at 4 to 5GB out of 15.6GB.

Can you give me a list of detailed information I can gather when this happens again? Thank you.
I mean something like “paste the output of the following commands” and such.

Nothing sophisticated, just the output of slabinfo -o for the kernel memory and nvidia-smi for the vmem.

@generix, it just happened again https://pastebin.com/raw/yq3KfDh7

nvidia-smi 
Sat Jun 23 11:28:21 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  On   | 00000000:01:00.0  On |                  N/A |
| 46%   44C    P0    N/A /  75W |    992MiB /  4032MiB |     34%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     22538      G   /usr/lib/xorg-server/Xorg                    404MiB |
|    0     22775      G   /usr/bin/gnome-shell                         155MiB |
|    0     24270      G   ./el                                         120MiB |
|    0     24271      G   ./el                                         125MiB |
|    0     24272      G   ./el                                         163MiB |
+-----------------------------------------------------------------------------+

It was at 16% last night when I locked the computer. I unlocked it today to see it jumped to 34%.

Nothing unusual though the Xserver vmem usage rises over time but still inside normal range. So no explaination why a module unload is necessary to return to normal operations.

I will compile and install linux kernel 4.14.52 and start counting again in the morning. I’ve barely updated anything on my computer this year so it’s simply limited to the kernel or nvidia driver.

The kernel update did not help. The same issue happened at around 3 days and 15 hours of uptime. Anyone has any ideas?

@generix, it appears I was hitting the resource leak that 390.87 fixed. All is fine now.