Cuda unified memory usage is not accounted by linux cgroup

jiananwa · August 28, 2023, 8:21pm

System environment:
OS: Ubuntu 20.04
Linux kernel version: 5.15.0-46-generic
Nvidia Driver version: 515.86.01
CUDA version: 11.7
GPU: NVIDIA RTX A3000 Laptop GPU

Did a simple test by using cuda unified memory to allocate GPU memory with over-subscription, and run that test binary under a linux cgroup with memory constraints of 500MB, like the following commands:

sudo apt install cgroup-tools
sudo cgcreate -g memory:myGroup
sudo su
echo 500M > /sys/fs/cgroup/memory/myGroup/memory.limit_in_bytes
sudo cgexec -g memory:myGroup ./cuda_unified_memory_test

After running this binary, we could see that even with more than 500MB being touched and consumed in this binary, it is not OOM’ed by the linux cgroup, while inspecting the pmap of that process saw the big rss usage:

Linux# sudo pmap -p <pid of the test binary> | grep nvidia-uvm
0000000204600000   2048K rw-s- /dev/nvidia-uvm
00007fc060000000 4194304K rw-s- /dev/nvidia-uvm
Linux# cat /sys/fs/cgroup/memory/myGroup/memory.usage_in_bytes
84951040

Therefore, I would make an assumption that oversubscribed GPU memory by cuda unified memory will not be constraints by the Linux cgroup (we are using v1 now). Please let me know if I missed anything, or there is an effort tracking and enhancing this already.

From production impact perspective, if the process is running on kubernetes and failed to be constrained by linux cgroup from memory perspective, then it will lead to system OOM, which could be bigger issue on production offline job running.

Thanks!

Topic		Replies	Views
CUDA 6.5 Unified Memory (cudamallocmanaged) CUDA Programming and Performance	1	2162	February 18, 2015
Unified memory and overprovisioning CUDA Programming and Performance cuda	5	1333	March 6, 2022
Unified memory oversubscription and page faults CUDA Programming and Performance	7	2740	March 23, 2018
Bad performance when using unified memory CUDA Programming and Performance	2	3349	April 21, 2019
Memory usage values in nvidia-smi command CUDA Programming and Performance	4	1301	November 21, 2023
Unified Memory Limits? CUDA on Windows Subsystem for Linux	7	3383	July 6, 2022
Unified memory not working in multi GPU system CUDA Programming and Performance	5	2159	February 23, 2023
Abysmal performance with Unified Memory and CUBLAS CUDA Programming and Performance	15	4256	November 29, 2014
Pascal & capabilities 6.0 show cudaDevAttrConcurrentManagedAccess is 0 CUDA Programming and Performance	15	1359	December 27, 2018
Unified memory CUDA Programming and Performance	2	694	November 11, 2019

Cuda unified memory usage is not accounted by linux cgroup

Related topics