High virtual memory consumption on Linux for CUDA programs: is it possible to avoid it?

Hi,

I see that when CUDA is used on Linux the process virtual memory consumption is large, approximately matching the size of the physical GPU memory plus the size of system memory.

Quoting this comment:
https://devtalk.nvidia.com/default/topic/493902/cuda-programming-and-performance/consumption-of-host-memory-increases-abnormally/

this is related to UVA. we have to carve out a chunk of virtual memory equal to the total physical GPU memory, plus the total system memory, plus some small fudge factor for alignment purposes.
we actually throttle back on the UVA region if you run out of virtual memory. this will restrict the amount of memory you can allocate, though.

While I see that this has no special performance consequences, is there some way to avoid this, for example by setting flags in cuCtxCreate() or by globally tweaking the driver settings?

Also, can someone link to documentation/specification which states how virtual memory is used by CUDA on Linux? So far I only found references to this behavior on random places over this forum and SO.

Many thanks in advance.

One way to reduce the size of the allocation is to reduce the GPU “footprint” in a multi-GPU system, if all GPUs are not needed.

For example, if you have a CUDA code that only uses 1 GPU, but you run it on a system that has 4 GPUs, the 4 GPUs will affect the size of the virtual space reservation requested. You could reduce this, in this particular scenario, by setting the CUDA_VISIBLE_DEVICES environment variable to restrict the CUDA runtime to only a single GPU.

AFAIK there are no direct controls over virtual memory allocation performed by CUDA, and it is not formally documented anywhere.

Thanks Robert. The CUDA_VISIABLE_DEVICES trick could be useful although of limited use.

Do you know if it would be possible in principle to reduce the virtual memory usage or if this behavior is deeply ingrained into the Linux implementation?

Although it shouldn’t have any performance impact that’s still a bit annoying, because it misleads users into thinking that the CUDA process is taking a lot of memory.

In order to provide a unified address space, all physical memory (host system and GPUs) must be mapped into a single virtual space. As a result, CUDA’s virtual memory usage will look huge. I am not aware that this has negative consequences of any kind. What specific problems are you encountering?

I don’t see what is misleading here. The virtual memory usage is stated correctly by the operating system. Users may be thinking incorrectly about what that number means. If you care, you could educate them on virtual memory. Or you could pay no attention to what users think as long as things are working.

Agreed, as long as there are no performance penalties (and until now I have no evidence of that) I guess there is no issue with high memory usage. Thanks.