This is what I get when I run nvidia-smi, is this supposed to be normal? I am starting out with AI and have read that more VRAM is better for a bigger batch size and helps with lowering training time. So 1.4GB VRAM usage out of 8GB total looks kinda sketchy. If this is unusual, what should I do?
P.S. I do have a 4K monitor set to 60FPS and thought that might’ve been the reason. I set it to 1080p60 and rebooted, but it still used ~1GB+ VRAM.
─┬─[ pts/3 0 21-04-10 11:45:22 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> nvidia-smi
Sat Apr 10 11:45:29 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3070 Off | 00000000:09:00.0 On | N/A |
| 41% 47C P8 16W / 220W | 1532MiB / 7979MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2269 G /usr/lib/xorg/Xorg 102MiB |
| 0 N/A N/A 3164 G /usr/lib/xorg/Xorg 917MiB |
| 0 N/A N/A 3741 G /usr/bin/gnome-shell 267MiB |
| 0 N/A N/A 297921 G ...AAAAAAAAA= --shared-files 226MiB |
| 0 N/A N/A 304816 G gnome-control-center 3MiB |
| 0 N/A N/A 304931 G /usr/bin/nvidia-settings 0MiB |
+-----------------------------------------------------------------------------+
─┬─[ pts/3 2 21-04-10 11:48:31 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> apt list --installed | grep nvidia | grep driver
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
nvidia-driver-460/focal,now 460.67-1pop0~1616430777~20.04~71e1ad1 amd64 [installed]
─┬─[ pts/3 0 21-04-10 11:48:46 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> apt list --installed | grep nvidia | grep xorg
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
xserver-xorg-video-nvidia-460/focal,now 460.67-1pop0~1616430777~20.04~71e1ad1 amd64 [installed,automatic]
─┬─[ pts/3 0 21-04-10 11:45:29 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> uname -a
Linux flameboi 5.11.0-7612-generic #13~1617215757~20.04~97a8d1a-Ubuntu SMP Thu Apr 1 21:15:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
─┬─[ pts/3 0 21-04-10 11:46:01 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> lsb_release -a
No LSB modules are available.
Distributor ID: Pop
Description: Pop!_OS 20.04 LTS
Release: 20.04
Codename: focal
You have some scaling set, 5120x2880->3840x2160, though I don’t think that should have the impact on vmem usage you’re experiencing.
The monitors get requeried in fast succession, I think I’ve seen this before on PopOS. Was some plugin/config service which had to be disabled IIRC.
It most probably doesn’t, but at this point I’m not sure. As I previously mentioned, I set the monitor resolution to 1080p60 and rebooted. A while later of getting into my regular workflow the VRAM usage was back to it’s “normal high” ~1.3GiB.
Also, I have resolution set as 3840x2160 and fractional scaling at 150% in gnome-control-center, is that why my scaling resolution is at 5120x2880? And, is this a normal thing? I saw the same resolution in OBS too; Running xrandr --currentandxdpyinfo | grep dimensions confirm that it indeed is 5120x2880 pixels. But from my monitor’s menu, it shows that it is receiving input at 3840x2160@60 (which is also the optimal resolution as the menu suggested). Maybe the scaling from 5120x2880@60 to 3840x2160@60 is causing higher than normal VRAM usage?
Talking about plugins, I do have one gnome extension at the moment which shows network speed. But I performed OS reinstalls several times (on the same hardware with only Pop!_OS 20.04 LTS) and for a few times did not install said gnome extension and saw the same high VRAM usage. Based on that experience I’m 99% sure it’s not a memory leak from that particular gnome extension and I do not have any other extension installed (installed by the user, there are some pre-installed like pop-shell tiling extension etc). So I’m not sure where to look. If you do remember it at a later point in time, please let me know. TIA! ;)
I found the thread https://forums.developer.nvidia.com/t/high-cpu-usage-on-xorg-when-the-external-monitor-is-plugged-in/169173
Though the symptom on this thread was high cpu usage on hybrid graphics, the underlying symptom was the same, fast monitor requeries. Unfortunately, no solution found besides that this was popOS specific.
You could try disabling the power-daemon and plugin, which was my last guess in that thread.
Fractional scaling is Ubuntu specific, I don’t have experience how well it works meanwhile and its impact on performance/vmem usage.
From the mentioned thread, I boiled down the problem to the problem to the System76 Power gnome extension (which was active in the screenshot above). I disabled it and rebooted, and the problem still persists.
─┬─[ pts/0 0 21-04-10 20:32:03 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> nvidia-smi %
Sat Apr 10 20:32:04 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3070 Off | 00000000:09:00.0 On | N/A |
| 41% 42C P8 16W / 220W | 1073MiB / 7979MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2285 G /usr/lib/xorg/Xorg 102MiB |
| 0 N/A N/A 3295 G /usr/lib/xorg/Xorg 349MiB |
| 0 N/A N/A 4116 G /usr/bin/gnome-shell 404MiB |
| 0 N/A N/A 5368 G ...AAAAAAAAA= --shared-files 202MiB |
+-----------------------------------------------------------------------------+
─┬─[ pts/0 0 21-04-10 20:32:04 ]
├─[ flameboi: atheistd ▶ /home/atheistd ]
╰─> uptime %
20:33:53 up 2 min, 1 user, load average: 0.44, 0.49, 0.22
I do not know what do you mean by that. Could you point me to an article either explaining “fast monitor requeries” or an article about how to prevent that? TIA!
Edit: My setup is a Ryzen 9 (no iGPU) with a RTX 3070 on a desktop, so I don’t think that the System76 Power package might cause any issues as there is only one display and only one GPU.
By monitor requeries I mean something is calling xrandr in a loop, this can be seen in the xorg logs (can also be caused by a broken cable but this has also other effects):
@generix, I nuked my Pop install and switched to Ubuntu 20.04 and here’s my nvidia-smi output. Turns out it wasn’t Pop specific. I did a # apt upgrade -y and rebooted to get the latest NV drivers, but it still didn’t help.
Edit: VRAM usage just went up to 1076MiB from startup to replying to this post. :(
Hi @thefirst1322 I have the same issue, and it is not specific to PopOS or NVIDIA hardware – I’ve been able to replicate it with Ubuntu and Arch on both NVIDIA and AMD hardware. The problem is that you are running a very high resolution – 4K is a high resolution to begin with, and with xrandr fractional scaling you are actually running at 5120x2880. Therefore it is not surprising that VRAM usage is high. Either switch to a lower resolution screen, or try KDE. In any case, with 8GB of VRAM I don’t think you need to worry yet.
I am experiencing exactly same issue. I have 2 notebook running Ubuntu 20.04 and their resolution are 1920x1080 and xorg uses 24G virtual memory on start. Switching back to open source non NVIDIA driver does not have sure issue, but I need to use NVIDIA driver for our AI development. I believe the resolution is not the only factor.
This is still an issue for me on Ubuntu 22.04.1 (RTX 2060, driver version 515.76). Any suggestions please? I’d like to give as much VRAM as possible to CUDA, not Xorg.