Non-existent shared VRAM on NVIDIA Linux drivers

Okay, so I did some research, too. One brute force method could be to patch the open kernel drivers in the following way (not tested!):

diff --git a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
index 6b92c753..5cdcf594 100644
--- a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
@@ -367,10 +367,25 @@ int nv_drm_dumb_create(
     allocParams.type = NVKMS_KAPI_ALLOCATION_TYPE_SCANOUT;
     allocParams.size = args->size;
     allocParams.noDisplayCaching = true;
-    allocParams.useVideoMemory = nv_dev->hasVideoMemory;
     allocParams.compressible = &compressible;

+    // First attempt: try to allocate in video memory if available and requested
+    NvBool originalUseVideoMemory = nv_dev->hasVideoMemory;
+    allocParams.useVideoMemory = originalUseVideoMemory;
     pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+
+    if (pMemory == NULL && originalUseVideoMemory) {
+        NV_DRM_DEV_LOG_INFO(
+            nv_dev,
+            "VRAM allocation failed for dumb object of size %" NvU64_fmtu ", "
+            "attempting system memory fallback.",
+            args->size);
+
+        // Fallback attempt: try to allocate in system memory
+        allocParams.useVideoMemory = NV_FALSE;
+        pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+    }
+
     if (pMemory == NULL) {
         ret = -ENOMEM;
         NV_DRM_DEV_LOG_ERR(
@@ -541,12 +556,27 @@ int nv_drm_gem_alloc_nvkms_memory_ioctl(struct drm_device *dev,
     allocParams.type = (p->flags & NV_GEM_ALLOC_NO_SCANOUT) ?
         NVKMS_KAPI_ALLOCATION_TYPE_OFFSCREEN : NVKMS_KAPI_ALLOCATION_TYPE_SCANOUT;
     allocParams.size = p->memory_size;
-    allocParams.useVideoMemory = nv_dev->hasVideoMemory;
     allocParams.compressible = &p->compressible;

+    // First attempt: try to allocate in video memory if available and requested
+    NvBool originalUseVideoMemory = nv_dev->hasVideoMemory;
+    allocParams.useVideoMemory = originalUseVideoMemory;
     pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+
+    if (pMemory == NULL && originalUseVideoMemory) {
+        NV_DRM_DEV_LOG_INFO(
+            nv_dev,
+            "VRAM allocation failed for GEM object of size %" NvU64_fmtu ", "
+            "attempting system memory fallback.",
+            p->memory_size);
+
+        // Fallback attempt: try to allocate in system memory
+        allocParams.useVideoMemory = NV_FALSE;
+        pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+    }
+
     if (pMemory == NULL) {
-        ret = -EINVAL;
+        ret = -EINVAL; // Or -ENOMEM depending on typical failure code
         NV_DRM_DEV_LOG_ERR(nv_dev,
                            "Failed to allocate NVKMS memory for GEM object");
         goto nvkms_alloc_memory_failed;

But this will most likely result in abysmal bad performance for applications that request a device-local render surface but get back a host-memory allocation. I think the purpose of the flag is to explicitly get what you ask for, while the absence of the flag allows the driver to use or migrate memory how it thinks it would be optimal.

So at this point, the logic clearly implies: “If you ask for VRAM, then you get VRAM, or you don’t. Deal with it.”

That also means, looking at the example of niri, that those wayland compositors always ask for device-local memory if VRAM becomes a bottleneck and cannot be migrated by the driver.

So let’s analyze niri a bit further - I’ve used AI for this hypothesis but drew the conclusions myself:

  • Niri uses smithay as its GBM allocator.
  • The interface doesn’t provide a way for niri to say “I want device-local memory” or “I don’t care”.
  • smithay’s GBM allocator uses flags like GBM_BO_USE_RENDERING, which signals the underlying driver (Mesa/NVIDIA) to provide a buffer optimized for GPU rendering. For performance reasons, the driver will always try to place such buffers in VRAM (device-local). This is great for performance but becomes a problem when VRAM is exhausted.
  • So essentially, niri only ever asks for device-local memory, not because it intents to do that, but because smithay is designed to do it, without fallback logic.
  • The NVIDIA driver is acting correct here: It denies the request for device-local memory because it cannot allocate it.
  • If smithay had a strategy to request buffers without GBM_BO_USE_RENDERING (e.g., as simple linear buffers), the driver would likely place them in system memory. However, these buffers might not be usable for zero-copy GPU rendering, which would kill compositor performance. The core issue is the lack of a “prefer VRAM, but allow system memory” hint in the API and the application’s logic.
  • Thus, smithay doesn’t have a fallback logic (and so does niri).
  • And seemingly, niri always expects memory allocations to succeed (which is bad in itself).
  • This still points to a weakness in the driver’s memory management. Even if an application rigidly requests device-local memory, an ideal driver should be able to make space by evicting other, less-used DEVICE_LOCAL resources to system memory in the background. The fact that it instead fails the allocation suggests this on-demand eviction isn’t working as robustly as it could. That is, unless all of the VRAM is allocated with DEVICE_LOCAL and the driver intention is to keep that memory there - then there is no chance of eviction. And apparently, niri does exactly that.

I’m pretty sure that other Wayland compositors have a similar, non-optimal behavior.

1 Like