Poor performance after resuming from suspend

When suspending/resuming the PC my graphics performance suffers until I reboot. This has happened since at the the 390 driver, likely earlier.415.1 but still happens in 415.18.

AMD Threadripper 1950X
32GB ram
nvidia 980ti

Here’s a practical test:

Fresh boot:

__GL_SYNC_TO_VBLANK=0 glxgears
104085 frames in 5.0 seconds = 20816.875 FPS
105471 frames in 5.0 seconds = 21094.119 FPS
105602 frames in 5.0 seconds = 21120.379 FPS

After suspend/resume:

__GL_SYNC_TO_VBLANK=0 glxgears
67730 frames in 5.0 seconds = 13545.928 FPS
68753 frames in 5.0 seconds = 13750.495 FPS
70471 frames in 5.0 seconds = 14094.160 FPS

I’ve lost ~30% 3d performance just by suspending/resuming the PC.

I don’t think it’s a KDE issue because if I restart the display manager (sudo systemctl restart sddm)

I still get poor performance:

__GL_SYNC_TO_VBLANK=0 glxgears
76633 frames in 5.0 seconds = 15326.591 FPS
80297 frames in 5.0 seconds = 16059.361 FPS
78971 frames in 5.0 seconds = 15793.507 FPS
76905 frames in 5.0 seconds = 15380.963 FPS

edit: I’ve also tried reloading the driver:

systemctl stop sddm
rmmod nvidia_modeset
rmmod nvidia_drm
rmmod nvidia
modprobe nvidia
modprobe nvidia_drm
modprobe nvidia_modeset
systemctl start sddm

The performance is still poor compared to a fresh boot. Suspending seems to cause an issue on the hardware that’s only fixed with a reboot.
nvidia-bug-report.log.gz (560 KB)

Please run nvidia-bug-report.sh after resume as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

I’ve attached the after-suspend nvidia-bug-report. I’ll reboot and attach a clean boot version to this post.
nvidia-bug-report.log.gz (1.05 MB)

Nothing obvious besides you’re missing the acpid. Please install it, reboot and check if that fixes the issue. If not, suspend/resume, then put the gpu under load and create a new nvidia-bug-report.log.

I’m seeing similar performance drops when Xorg uses a lot of memory due to running Chrome in the background. Games are seeing a vast performance drop then. Quitting those memory hogs helps. It looks like the driver cannot migrate memory between system and VRAM (or doesn’t want to). You can use “nvidia-smi” to see which PID uses memory. Some applications may allocate memory through Xorg and not directly.

Is it possible that you are seeing less free VRAM after resume than before suspend? Compare with nvidia-smi before and after. Does performance return after you quit processes hogging memory?

BTW: Changing the settings for transparent hugepages in the kernel can make a big difference. Even with a lot of VRAM still available, the performance may degrade early when using “transparent_hugepage=always” instead of “…=madvise”, even resulting in memory allocation errors despite having a lot of VRAM and sysmem free.

I’m late coming back to this but haven’t had the time to delve any deeper. I still have the same problem. I have enabled and started the acpid service

I also tried drm modesetting which didn’t help.

I’ve uploaded a video of the problem here: https://youtu.be/-LLAqIL95N0

The first segment is a fresh boot, the second is after a suspend/resume. No other programs running in the background other than OBS to do the recording and Lutris/Steam to launch the game. AS you can see the frames and frametimes are all over the pace after a suspend/resume.

This doesn’t just affect wine but as above with glxgears.

As for memory usage, after suspend the xorg process is 708mb, the same as before (Not bad considering I have 2x 3840x2160 monitors). Obviously as I open programs it increases. However, closing everything and running only a game has the same effect: Fine on boot, awful after resume.

I don’t believe I have transparent_hugepage set. At least accordng to

# sysctl -a | grep huge
vm.hugetlb_shm_group = 0
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0

edit: My transparent_hugepage defaults to madvise. Setting it to “never” had no effect.

edit 2: Attached nvidia bug report after suspend while under load (running unigine-heaven)

nvidia-bug-report.log.gz (732 KB)

You’re always getting this on resume:

[29879.536331] smpboot: Booting Node 0 Processor 4 APIC 0x8
[29879.538460] TSC synchronization [CPU#0 -> CPU#4]:
[29879.538461] Measured 408 cycles TSC warp between CPUs, turning off TSC clock.
[29879.538462] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[29879.538467] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[29879.538468] sched_clock: Marking unstable (29879288988313, 248577057)<-(29879651330781, -112866217)

Might affect performance. Did you check for a bios update recently?

I’m on the latest bios and TSC isn’t in use as far as I can tell.

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet

I’ll try different clock sources and see if it makes any difference.

edit: On boot it uses tsc, on resume it uses hpet. It looks like this is the source of the problem.

I’m not sure that’s the issue.

Either setting

echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource

or setting clocksource=hpet as a kernel parameter so it’s chosen on startup has no effect. As far as I know this means TSC is never used. Despite tsc never being set, I have the same problem. Boot is fine, after suspend performance is worse despite both states now using hpet.

So that doesn’t seem to be the issue. Did you run any general cpu/mem benchmarks to check if this is really only affecting gpu performance?

Fresh boot:

https://browser.geekbench.com/v4/cpu/12016175

After suspend/resume:

https://browser.geekbench.com/v4/cpu/12016209

I’m not sure why after suspend is showing slightly higher here but they are in the same ballpark. In comparison graphics loses about 30% and has all kinds of frametime issues.

Actually looking at that there is a newer bios available but it just adds support for 2nd gen CPUs so I don’t think it will make much difference but I’ll try applying it and see if it helps.

I set tsc=unstable as a kernel parameter to disable it entirely (removing the warning), force hpet and with that done the difference between pre-suspend and post suspend is 8% in glxgears. I’ll try unigine-heaven and see if that reports any differences.

This has definitely improved the situation but it’s still rather annoying. In SkyrimSE at 4k under Wine, pre-suspend I’m pretty much locked at 60fps in most areas with occasional drops down to about 52. Post-suspend the fps is noticeably worse with 60pfs rarely reached with drops down to 42fps instead of 52fps pre-suspend.

If it helps, After suspend there are noticeable framerate dips when looking around that aren’t there on a fresh boot which makes me think it could be a vram problem. It doesn’t just happen with SkyrimSE but it’s definitely most pronounced there of the games I tried.

So with that said, it is perhaps a kwin issue. I’ll try a different desktop environment and see if I have the same problem.

You could run bandwidthTest from the cuda demos to check for that
https://docs.nvidia.com/cuda/demo-suite/index.html
Though that would only show a symptom, not the reason.
Maybe check if there’s some aspm problem by turning it off using kernel parameter pcie_aspm=off

pcie_aspm=off makes no difference. I’ve attached a screenshot of the frametimes in skyrim.

Before suspend the line is flat and the frametimes are consistent. After suspend, the line is flat if i stat looking at the same point. If i move the mouse around so that the scene being rendered is different the times spike.

edit: I could try the nouveau driver but the performance is so poor I think I’d struggle to see the difference if there was an issue.

bandwidthTest results (I set the performance mode for both tests in the nvidia settings gui given the notice ):

Fresh boot:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 980 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12802.1

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     7461.4

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     242920.8

Result = PASS

After suspend:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 980 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12800.3

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     7508.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     242856.3

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

No real differences there unfortunately.