VRAM Allocation Issues

Hello,
Lately there seems to be a very serious VRAM Allocation Issue with dxvk/vkd3d and even native vulkan applications.
It seems to push the VRAM usage towards it’s absolute limit(much much higher than anything on windows) and even enabling DLSS(which if you compare windows to linux it does NOT free up VRAM).
To provide as much details as possible, firstly as a start the following games seem to be highly affected:
Star Wars Jedi Fallen Order, Ready Or Not, Doom Eternal, Gears 5, COD WWII.
Observed behavior in each title:
SW JFO: the game keeps consuming VRAM but does not go over the limit maximum of total usage it gets to xorg/wayland + DE apps + the game itself total is 7.5gb out of 8GB(in my case), what happens after entering the pause menu a few times on all settings set to EPIC including textures, the game goes into a crawl and becomes unplayable.
Relevant bug report: [d3d11] Jedi Fallen Order: fps has a massive drop after acessing the menu · Issue #2552 · doitsujin/dxvk · GitHub
Ready Or Not: in dxvk(dx11) it goes into extremely high VRAM usage and using DLSS does not affect it at all(changing in-game settings between low and epic make no difference) the game goes into 7.9-8.0gb of vram usage which causes the entire desktop to be unresponsive or web browser not being able to be opened(again for both xorg/wayland applies). on vkd3d after it hits 8gb of vram the game goes into a crawl mode.
Doom Eternal is affected by Ready Or Not issue, which also prevents recording with obs + nvenc being possible(can’t initiate recording), while Gears 5 struggles from a similar issue as SW JFO but that one triggers when cutscenes change/acts change after a while.
COD WWII degrades performance after a sometime as well(usually it doesn’t take long approx.5-30mins of playtime).
It could be that other games are affected too, but in general this seems like a very high VRAM usage regardless of the in-game settings(to some titles even decreasing the in-game settings doesn’t help).

Just to update, this doesn’t seem to be a DE issue as I tried Gnome, Cinnamon and both observe similar behavior +/- taking slightly longer to trigger.

Operating System: EndeavourOS
KDE Plasma Version: 5.26.5
KDE Frameworks Version: 5.101.0
Qt Version: 5.15.8
Kernel Version: 6.1.6-273-tkg-pds (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5800X3D 8-Core Processor
Memory: 31,3 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3070/PCIe/SSE2 (8GB VRAM)

1 Like

VRAM allocation bug has existed for over a year now and Nvidia refuse to fix it: https://redshift.maxon.net/topic/42190/super-slow-redshift-ipr-and-rendering/ - not sure if it’s the exact same bug, but obviously the drivers suck at VRAM management.

Hi,
Unfortunately it isn’t possible to open the link provided.
However if the issue is like this → VRAM tops to fill it, for example full 8GB used out of 8GB the entire session might also crash(outside of the issues mentioned above), it also never frees or recovers during said application runs.
It also does seem to not take into consideration the actual VRAM on the GPU and goes on to use as much VRAM as the application wants(while on windows it adjust the application to the GPU’s VRAM…) which completely fails the allocation, a good example is a native vulkan app as Doom Eternal if you enable DLSS, Ray Tracing and set everything to ultra nightmare(incl. textures) it straight up crashes when you load with error “failed to allocate vram”. This does incline that it could be something from the driver itself. Haven’t ran into an issue with applications, but it could become a problem later down the line.
Unfortunately I am unable to debug more or see what is causing it to go into such excessive VRAM usage.

On this report this also seems to have some effect on windows, it can be observed on this video: NimeZ Modded Drivers vs. AMD Legacy in 2023 - YouTube
Yes it is AMD, but at the timestamp I’ve set it show cases nvidia 471(or 470 for linux) how it barely uses 3.3GB VRAM while on the 526(or 525 for linux) it goes into excessive VRAM usage causing it to go over the budget and start using system RAM.
This does incline that it is a driver bug.

EDIT: Just to confirm, yes it is a regression. I downgraded the drivers to 470 and to my surprise it doesn’t go into excessive VRAM usage and uses far less. On 525 however it keeps growing more and more and more until it goes over the budget(examples below).
This issue is far worse with dxvk/vkd3d titles than native vulkan, but in all cases it is observed.
Mangohud showcases VRAM usage on DE + Apps + Game itself combined.

470:



525:



EDIT 2: After some digging I was able to narrow down the search, unfortunately I wasn’t able to boot into neither 495 nor 510 but one of these two drivers introduce the regression.
Just FYI a very interesting thing is observed on 470 the reported VRAM is 7973MB and it stays within that limit, HOWEVER on 525 the reported VRAM is 8192MB and as time passes it goes at about 7960MB and even beyond that, so I believe the problem is of how VRAM is reported showing more than what is available. Hopefully this can be addressed in a future update.


EDIT 3: That may not be the reason as well as checking on the model it seems that 8192 is the correct one, but something definitely changed to cause this effect either 495 or 510.

Final edit: Tinkering a little bit it also doesn’t seem to be the VRAM itself you can have 7,4GB of vram used and it would still degrade performance(good example: portal with rtx)…

I also very hope that this weird issue will be fixed. It makes 4gb GPUs (for example my rtx 3050 mobile) almost unusable on linux. Once vram usage reach 100% fps drops to 3-5 and never recover. Very strange that so critical bug don’t fixed so long time.

Do somebody know is it possible to submit issue directly to NVIDIA somehow?

Here is some other issue description:

Definitely, if 8GB cards can feel such an impact, for anything less I’d be surprised if it is somewhat ok.
In regards to the topic I was able to find another regression, using DLSS instead of lowering VRAM usage it increases it(again no parity with how it is on windows) on a title like Ready Or Not, this behavior is not seen in Cyberpunk for example, but it is still a concern as to why it’s happening.

1 Like

Nvidia’s Vulkan driver isn’t reporting the correct amount of available VRAM it looks like. Doom 2016’s in-game overlay shows this, at least.

I have filed a bug 3958948 internally for tracking purpose.
Shall try to reproduce issue locally and get back to you if require any additional information.

1 Like

I tried to repro issue on couple of notebooks which has RTX 3080 Laptop GPU and GeForce RTX 3060 Laptop GPU by playing games like Ready or Not and Cyberpunk but could not get repro.
I observed that half of VRAM is getting utilized.
Request you to share nvidia bug report so that I can try to match hardware.
Also please confirm the number of external displays connected and how long it takes usually to consume VRAM completely.

Hello,
Unfortunately I have swapped my RTX 3070 for a 3090 and no longer have access to it to provide a bug report, but will provide details as much as possible.
I use 2 monitors 1440p@165hz and 1080p@144hz, game is running on 1440p display, Ready Or Not manages to fill it in quite fast regardless if it’s played on low, medium, high or epic settings, DLSS enabled(on any of the quality modes) speed up the process of the VRAM being filled in).
Real quick test would be a 3070 8GB of VRAM, playing Ready or Not at 1440p settings medium/high DLSS enabled(any of the setting would do), play around a map(found that valley of the dolls can fill it fairly quickly).
Initially it is low, but by exploring the house the more you explore the more VRAM spikes(around 5-10mins in you should have already filled the 8GB).

Hi @amrits !
Here the report generated right after the issue reproduced. Easiest way to reproduce is a 4gb GPU as my RTX3050. This time I used Cyberpunk 2077 with high textures. But the issue reproduced even on low but after ~1 hour of gameplay when VRAM usage reached 100%. It actually happens in any game that can reach 100% vram usage. Also one interesting detail: I tried to check it on gtx 970m and it looks like no such issue (but I not 100% sure).
I use laptop screen and have no external screens connected.
nvidia-bug-report.log (18.5 MB)

This issue is occurs on my desktop GTX 970 4GB. Also happens when I am open (esc) menu or chagne graphics settings (doesn’t matter if I’m set it lower of higher) in some games like god of war, far cry 5 and some others

1 Like

I’m getting a failed to allocate vram error when running Deep Rock Galactic on Linux using Proton. I’m on the 525.78.01 driver. I just upgraded yesterday, and it just started happening yesterday.

Linux Mint, up to date. GTX970SC

Tried repro issue again on couple of desktop configurations and notebooks but I am not seeing performance and fps drop. Below are the test results.

HP OMEN by HP 25L Gaming Desktop GT15-0xxx + Ubuntu 22.04.1 LTS + NVIDIA GeForce RTX 3080 + Driver 525.78.01
I played Ready Or Not in dxvk(dx11) for almost an hour where I saw VRAM usage goes up to ~7GB out of 10GB but did not observed any drop in performance and fps value.

HP OMEN by HP 25L Gaming Desktop GT15-0xxx + Ubuntu 22.04.1 LTS + NVIDIA GeForce RTX 1050 Ti + Driver 525.78.01
I played Ready Or Not in dxvk(dx11) for almost an hour where I saw VRAM usage goes up to ~4GB out of 4GB but did not observed any drop in performance and fps value.

TUXEDO Polaris AMD Gen2 (REN) + AMD Ryzen 7 4800H with Radeon Graphics + Ubuntu 22.04.1 LTS + NVIDIA GeForce RTX 3060 Laptop GPU + Driver 525.78.01
I played Ready Or Not in dxvk(dx11) for almost an hour where I saw VRAM usage reaches maximum of around 5GB out of 6GB.

Acer Nitro AN515-45 + AMD Eng Sample: 100-000000300-40_Y + Ubuntu 20.04.5 LTS + NVIDIA GeForce RTX 3080 Laptop GPU + Driver 525.85.05
I played multiple games like Cyberpunk and Ready Or Not in dxvk(dx11) for some time but could not observed drop in performance.

I will try on few more setups but if anyone has more detailed repro information to share, please feel to do so.
Thanks everyone for trying to help.

2 Likes

@amrits , thank you very much for taking the time to investigate this problem!

In my own experience with an RTX 3070 8GB (more specifically, a Gigabyte GeForce RTX 3070 Gaming OC), the card behaves pretty well in all games – with one very notable exception: Star Citizen.

Star Citizen seems to make heavy use of textures and is likely rather unoptimized. Their current PTU (“Public Test Universe”), in which they test their version 3.18, for me is practically unplayable on Linux on the 3070.

Here’s a recording of a short gameplay in one of the more resource-heavy locations in the game, “Lorville”, with nvtop in the background showing the VRAM usage: Star Citizen 3.18 PTU on NVIDIA 3070 8GB: severe FPS drop on VRAM exhaustion - YouTube . At around the 4:00 mark, the frame rate can be seen dropping to single-digit numbers with pauses between the frames of even several seconds.

The problem seems to be reliably reproducible for me on the 3.18 PTU, in each one of the frequently released newer builds. Neither of these builds exhibits the same problem on Windows, where the frame rate is reasonably stable even for extended play sessions of several hours, including in a KVM VM with a GPU passthrough.

Reporting lower VRAM to Star Citizen via the dxgi.maxDeviceMemory/maxSharedMemory parameters of DXVK can somewhat alleviate the problem, but only by postponing the severe frame rate drop perhaps to a few tens of minutes, or until a second texture-heavy location is visited in the game (usually the “towns”).

Lowering the graphics settings in Star Citizen doesn’t seem to alleviate the problem appreciably. It appears that – at the moment, at least – the lower settings do not reduce VRAM usage by much, so the problem appears even on the “Low” settings, albeit perhaps somewhat slower.

I’ve also noticed quite a few other people complaining about the same issues with Star Citizen on Linux, even on the “LIVE” version (currently at 3.17), and particularly those with NVIDIA cards, due to the generally lower amounts of VRAM. So, VRAM really seems a problem with Star Citizen, that’s likely only going to get worse as the game development progresses (seeing also how the optimization phase is being constantly postponed).

Comments by Valve’s kisak here and DXVK’s doitsujin here seem to suggest that the Linux driver could benefit from an improved VRAM asset management, particularly in such VRAM-heavy scenarios. I really hope this could be achieved, considering how the performance on Windows is actually pretty commendable.

A couple of notes on my specific hardware setup, just to make it easier to decode the log file. It’s a Xorg multiseat, dual GPU system, with a 3070 8GB and a 3080 12GB (no problems with the latter due to its larger VRAM). The desktop on the 3070 is streamed remotely via Sunshine, which also adds somewhat to the VRAM usage. The Star Citizen launcher is also taking up a considerable amount of VRAM (200-500+ MB), but it seems impossible to close it without closing the game – and it’s likely the same on Windows anyway. Star Citizen is installed and set up with Lutris, mostly following the Star Citizen Linux Users Group wiki.

nvidia-bug-report.log.gz (2.4 MB)

If I could help with more information or testing, please don’t hesitate to ask.

2 Likes

@amrits Thank you for your time!
Most of the setups you used have enough vram to be always below 100% of usage. Try to test it with RTX3050 mobile if you have such a possibility. I can reproduce it on any game which can use more than 4GB (Cyberpunk. Outer Wilds, Star Wars Jedi: Fallen Order).
Cyberpunk 2077 + RTX3050, everything on low + medium textures - very good example. You will see the issue right when the game loads or after a few minutes of gameplay (with low textures it happen after ~1 hour).
My laptop is GF75 Thin 10UC.

I have a 3080FE with 10Gb VRAM and also run into this issue.

This is not isolated to Star Citizen, but Star Citizen is the most “reliable” way to trigger VRAM exhaustion. It also happens with Elite Dangerous and KSP2 EA also likes to allocate 8Gb of VRAM for no good reason.

To reproduce this I simply spawn at Orison, move to the Kel-to Mall, take a closer look at the spacewhale, take the shuttle to Cousin Crows, if by this point the game is still responsive I fly to HUR L5.

At HUR L5 your trip will come to an end, you can try switch to windowed mode to reduce VRAM usage a tiny bit and you might stick the landing or you will stick the landing after closing the launcher.

Taking a screenshot at that moment will trigger a segmentation fault in libnvidia-glcore.so.

This problem also occurs with the 530 driver, but this driver has other issues. For example Elite Dangerous becomes completely unplayable due to MANY graphical artifacts appearing. Funny enough, it is only ED that shows these artifacts and going back to 525 “solves” the problem.

nvidia-bug-report.log.gz (381.3 KB)


When I took this screenshot nvidia-smi showed 9995/10240 VRAM usuage.

ED with 530:

Also note the temperature on my CPU going strong at 77, usually around 80 at that moment. There is no CPU usage to be found on htop at that moment either and while playing SC my CPU usually sits between 50-60.

I also have to add that VRAM usage on Linux vs Windows is inflated somewhat, because DXVK/VKD3D need to keep some resources in-memory as well.

Maybe it would be nice to test how DXVK behaves on Windows with Star Citizen, dropping the d3d11.dll in the Bin64 directory should do it!

1 Like

@amrits
I have a Zotac Twin Edge OC 3070 and can replicate the issue in Star Citizen and other games with high VRAM usage.

I have tried a number of kernel and graphics driver combinations without any solution.

System Info:
Ryzen 5950X
32GB Ram
Arch Linux - Kernel 6.2.2
3 Monitors (two display port, 1 HDMI)

Steps to replicate:
Spawn character in Star Citizen
Get into your ship
Fly to HUR L5 (this location in game has lots of asteroids and gas clouds - my guess is that the asset streaming overloads the VRAM)
FPS goes from 75 to 1FPS almost instantly
VRAM fills completely within seconds
Game is unplayable

nvidia-bug-report.log.gz (3.1 MB)

1 Like

Hi @amrits !
I made the video with bug repro. At 0:30 I open the map and fps drops to 10 and never recovers, GPU usage becomes constant 100% but temperature goes down. Opening the map is just the way to allocate a bit more VRAM. It happens without map as well when usage reaches 100%. Also if I open the map when VRAM usage is much lower then 100% everything is ok.
So, I think RTX 3050 mobile is the best GPU to investigate it because I can reproduce the same on almost every modern game.
Ping me if I can do something to help investigate this.

1 Like

That’s exactly how it is - the best ‘tester’ is Deathloop with everything on the max - under the RTX 4070 Ti after about 15 minutes of playing there is a hiccup as for the RTX 3050 … 100% with 12 GB VRAM. The same e.g. Cyberpunk 2077 on a mobile RTX 3060 with 6 GB VRAM - it doesn’t matter DLSS on Performace = 100% 6 GB already occupied in a moment on the highest settings and drop from e.g. 50 fps to 10 … under 520 drivers turning on DLSS on Performacne freeing a lot of VRAM - but on 525 there is a tragedy - it keeps VRAM busy all the time.

1 Like