I read newer drivers implement offloading VRAM to CPU ram or treating CPU ram as an extension of VRAM for applications like Stable Diffusion or running large language models so that larger models can be loaded.
What driver level for Linux supports this. Do I need to set anything for this to work. My system has a RTX 3060 and a RTX 4070.
1 Like
I am also looking on how to achieve this on Linux (CUDA – Sysmem Fallback Policy)
By Default offloading to RAM does not seem to be active.
Torch reports the following error when trying to use more than the GPU memory
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.04 GiB. GPU 0 has a total capacity of 11.76 GiB of which 11.64 GiB is free.
My configuration as follows :
Ubuntu 24.04
NVIDIA GeForce RTX 3060 12GB
Driver Version: 550.90.07
CUDA Version: 12.4
How is this still not got any comment on it? this is a critical feature and cripples all of the nvidia cards on linux
Because there is another thread about this: Non-existent shared VRAM on NVIDIA Linux drivers - #73 by lucasggamerm
1 Like