Non-existent shared VRAM on NVIDIA Linux drivers

I have the same issue with my RTX 3070 and the 555 driver. Playing games causes xwayland to crash with

[drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

I bought a AMD card yesterday. I think this will fix it ;)

1 Like

I have a NVIDIA 2080ti with NVIDIA 555.58.2 and same problem when i run game with a high VRam usage. XWayland crash with same error

@omarhanykasban706 @Schaufelmeister.Storch @florian.richer

In which games did you guys experienced crashes of XWayland? To try those games using my 3070 Max-Q and generate a bug report in the nvidia open modules github page.

OUTBRK on the very high settings because the game is not optimized so it make XWayland crash very often

Since I can’t edit this thread I will do some clarifications on this reply. What NVIDIA Linux drivers are missing is GTT (graphics translation table)/GART (graphics address remapping table) which something like the Shared GPU memory that exists on windows, but this is how is called technically on the Linux kernel.

GTT has been on the Linux kernel for 12 years or more already and it has been implemented on the AMD drivers (amdgpu) and intel (i915) since a long time ago and it’s baffling how nvidia has been unable to implement GTT on their drivers. This is not a brand new feature in constant development like Wayland.

Edit: The GTT feature from the Linux Kernel is not the same as nvidia-smi -gtt (GPU Target Temperature)

References:


Note for experienced people: If you have a AMD GPU, you can execute radeontop -d - -l 1 to get something like this:

Dumping to -, line limit 1.
1720222198.786508: bus 01, gpu 0.00%, ee 0.00%, vgt 0.00%, ta 0.00%, sx 0.00%, sh 0.00%, spi 0.00%, sc 0.00%, pa 0.00%, db 0.00%, cb 0.00%, vram 14.08% 574.87mb, gtt 7.33% 289.45mb, mclk 20.00% 0.300ghz, sclk 18.21% 0.214ghz

In the end, you can see this: vram 14.08% 574.87mb, gtt 7.33% 289.45mb. vram is obviusly the used VRAM of the GPU and gtt is the System RAM being used by the GPU.

If you have an Integrated Intel GPU, you open htop (to monitor your RAM usage) and execute Release v0.5.0 - Tune behavior with large PCIe BARs · GpuZelenograd/memtest_vulkan · GitHub under the Intel GPU. You will see how your RAM usage increases in less than 1 second. This would also apply for AMD users but I would recommend to use the script from this repository instead: GitHub - T-X/linux-amdgpu-radeon-vram-swapping-test: Linux amdgpu Radeon VRAM Swapping Test

2 Likes

In which games did you guys experienced crashes of XWayland?

Prety much any VRAM intensive game will cause a crash after a while. Try Ready or Not with maximum textures, that one seems to crash quite fast.

Could someone from NVidia please tell us if this issue is going to be addressed or not? (including an ETA if possible)

It’s a major problem when working with applications which need a lot of VRAM (e.g. Unreal Engine) and my 3060 isn’t cutting it anymore. So i have to decide to either get a 16GB 40 series card (if this issue gets sorted out) or a 24GB AMD 7900XTX (which apparently gives even more headroom given their driver supports shared memory)

What happened to the VRAM management with 555 drivers? Sometimes Xwayland starts to consume too much VRAM and crashes all the xwayland apps opened. And I’m not even playing anything really intensive, I’m just using Krita.

[26484.710101] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002b00] Failed to allocate NVKMS memory for GEM object
[26484.710166] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002b00] Failed to allocate NVKMS memory for GEM object
[26484.741552] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002b00] Failed to allocate NVKMS memory for GEM object
[26484.741628] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002b00] Failed to allocate NVKMS memory for GEM object

1 Like

For me it happens in Elite Dangerous. I can speed up getting to an XWayland crash by watching youtube videos on my secondary monitor. I now switched to an AMD Radeon RX 7900 XTX and this fixed all issues I ever had with desktop linux.
I really hope nvidia will address this issue soon. I still have an RTX 3060 TI built into my Laptop.

@Fijxu thanks for beeing so dedicated and researching stuff. Since this issue is related to the proprietary NVIDIA driver, I think opening a github issue in the open-gpu-kernel-modules repository is not doing much.

I don’t play PC games and I’m still seeing related errors on my work laptop.

[13150.718760] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[13150.720663] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[13177.944113] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[13177.944149] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

It has a 3050Ti with 4GB VRAM. Right now, nvidia-smi shows around 3GB RAM in use, with Firefox using over 1GB of it. Maybe that’s a Firefox bug?

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 555.58.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   55C    P8              7W /   35W |    3087MiB /   4096MiB |     13%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      4311      G   /usr/bin/kwin_wayland                         484MiB |
|    0   N/A  N/A      4438      G   /usr/bin/maliit-keyboard                      374MiB |
|    0   N/A  N/A      4496      G   /usr/bin/kded6                                  1MiB |
|    0   N/A  N/A      4505      G   /usr/bin/plasmashell                          208MiB |
|    0   N/A  N/A      4553      G   /usr/libexec/kactivitymanagerd                  1MiB |
|    0   N/A  N/A      4598      G   ...6/polkit-kde-authentication-agent-1          1MiB |
|    0   N/A  N/A      4599      G   /usr/libexec/org_kde_powerdevil                 1MiB |
|    0   N/A  N/A      4600      G   /usr/libexec/xdg-desktop-portal-kde             1MiB |
|    0   N/A  N/A      4751      G   kdeconnectd                                     1MiB |
|    0   N/A  N/A      4911      G   /usr/libexec/DiscoverNotifier                   1MiB |
|    0   N/A  N/A      5379      G   /usr/bin/kwalletd6                              1MiB |
|    0   N/A  N/A      5745      G   /usr/libexec/baloorunner                        1MiB |
|    0   N/A  N/A      7469      G   /usr/lib64/firefox/firefox                   1285MiB |
|    0   N/A  N/A      8039      G   ...bin/plasma-browser-integration-host          1MiB |
|    0   N/A  N/A      8136      G   /usr/bin/konsole                                1MiB |
|    0   N/A  N/A     98509      G   /usr/bin/Xwayland                               2MiB |
|    0   N/A  N/A    100804      G   ...erProcess --variations-seed-version        288MiB |
+-----------------------------------------------------------------------------------------+

Eventually the memory usage gets high enough that xwayland crashes and brings down half the system with it.

I’ve got 64GB system RAM, over half of which is unused…

My personal laptop is a Framework 16 with AMD graphics and I don’t experience this issue on there. Both are running the same OS (Fedora 40) with mostly the same apps (Firefox and VS Code being the two I use the most).