DirectX12 performance is terrible on Linux

After all, a graphics card is not a monolithic device — different tasks are handled by different units. And in some of them, a bottleneck can occur

Some more scary evidence for NVIDIA to absorb.

3 Likes

Do we have an idea of roughly when we’ll be seeing the general optimization for VKD3D that was mentioned earlier?

they haven’t said a driver that will have the fix, they never do, but they’ve started working on it a couple of months ago, so I personally think 585 will have it, worst case scenario it will be 590

1 Like

Assetto Corsa Evo performance on linux it’s bad to. And this game is running on DirectX 12
Very bad performance on linux with LOW graphics, when in the same time on windows with ULTRA it’s running pretty nice (RTX3080).

I believe that, for people with GPUs up to the 1000 generation, only the NVK driver may bring any raise in performance in the future.

2 Likes

I’m glad you said mostly because I can’t even use NVIDIA drivers for casual LLM scenarios on Linux because both GPU memory management and scheduling of workloads can best be described as broken, most likely because NVIDIA isn’t taking full advantage of Linux-native Direct Rendering Manager APIs, while they’re required to support everything WDDM offers natively under Windows.

Spanking the GPU resources of my 4070Ti on Windows is an incredibly stable experience. On a desktop Linux distro, I can’t even safely max out all of the VRAM (let alone have a process try to use 100% of its processing grunt) or bad things start to happen.

I just hope AMD gets to the point where non-training AI workloads start to really compete. That should force NVIDIA to at least fix the mess for their newer (20xx and above) cards.

Sorry to hear :( Have you posted somewhere in CUDA - NVIDIA Developer Forums ?
I guess I’m mostly lucky so far in this regard: I have 3 non-identical Nvidia eGPUs connected to my laptop for running local coding assistants and memory-management-wise they work as expected 99.9% of time (so far I had problems only with 1 particular model on ollama just like 2 days ago: haven’t managed to file any bug yet, hopefully this weekend).
UPDATE: to be clear what I meant: I’m quite aware that the driver obviously lacks some critical features like VRAM swapping, but the primitives that it does provide were mostly stable for me so far.

Scheduling of local LLMs is mostly handled in user-space (ie by llama.cpp, ollama etc). ollama in particular is known for poor scheduling and to make things worse, they removed possibility to manage splitting manually in v0.11.5 (see the gihub issue).

This has exactly ZERO to do with local LLMs. You can even remove nvidia-drm module completely and all compute workloads will be unaffected. CUDA (and also Vulkan to that matter) are lower-level layers than DRM.

That’s an even bigger problem that I first thought then.

On Linux, DRM is used by every other major GPU vendor to mediate access so that the process holding DRM master access can always take priority, potentially allowing desktop workloads to still function alongside compute even if you’re spanking a card to its fullest. Software is still required to use the DRM render node device file (e.g. /dev/dri/renderD128) even to perform compute, whether it be Vulkan or OpenCL or any other API, making every workload subject to kernel GPU scheduling algorithms.

On Windows, everything (even CUDA) is still subject to the whims of WDDM and its included GPU scheduling algorithms, so if I absolutely spank every drop of resources, I’m still able to use my computer for other interactive purposes, because built-in scheduling of each workload ensures other applications get their fair share of access to the card’s resources. With each improvement Microsoft makes, every vendor driver benefits automatically.

2 Likes

As far as I know (please correct me if I’m wrong), ppl who use AMD GPUs for compute tasks use Vulkan as ROCm has too many problems. Since Vulkan is lower level and OS-agnostic, it seems to me that running compute tasks on AMD GPUs would also bypass this DRM scheduling mechanism you described, no?

Again, unless I terribly misunderstand these things, Vulkan workloads bypass this WDDM for the same reasons, right? (ie, because it’s a lower level, OS-agnostic API, so cannot be possibly aware of WDDM).
Now CUDA kernels are also OS-agnostic, but I guess Nvidia might have put WDDM integration into host-level Windows libs (DLLs that is), but that’s just my speculation: do you know how it is integrated maybe?

The WDDM diagram found here might better explain it, it shows that even third-party OS-agnostic APIs like Vulkan and CUDA still have their resources scheduled by the kernel (via dxgmms2.sys) and are thus subject to the whims of WDDM. What Vulkan/CUDA can do, is use their own multi-GPU APIs, but even that still has to fairly contend with other processes using those GPUs (as determined by dxgmms2.sys). Even Microsoft’s non-display, compute-optimised alternative to WDDM, known as MCDM still forces all calls (irrespective of API) through the OS-provided GPU scheduler. NVIDIA advertises an alternative called TCC which can be used to bypass WDDM entirely, but that’s only available for non-GeForce cards, and when it’s enabled, the GPU can’t display anything, so it’s not intended for use outside of a datacentre.

On Linux, well-written DRM drivers are meant to fulfil a very similar role, with all accesses to GPU resources mediated by it, irrespective of whether OpenGL, EGL or Vulkan is used. If an application malfunctions and hogs resources, the application with master access (usually the compositor or display server) fundamentally gets more control over the card than other applications do.

In practice, what we’re seeing is NVIDIA’s drivers working great on Windows because they’re forced to by Microsoft requiring it (or else no NVIDIA drivers at all), while the Linux drivers are still working terribly because there’s no practical mechanism to force NVIDIA to do the right thing.

2 Likes