OS: Ubuntu 20.04
Linux kernel version: 5.15.0-46-generic
Nvidia Driver version: 515.86.01
CUDA version: 11.7
GPU: NVIDIA RTX A3000 Laptop GPU
Did a simple test by using cuda unified memory to allocate GPU memory with over-subscription, and run that test binary under a linux cgroup with memory constraints of 500MB, like the following commands:
sudo apt install cgroup-tools sudo cgcreate -g memory:myGroup sudo su echo 500M > /sys/fs/cgroup/memory/myGroup/memory.limit_in_bytes sudo cgexec -g memory:myGroup ./cuda_unified_memory_test
After running this binary, we could see that even with more than 500MB being touched and consumed in this binary, it is not OOM’ed by the linux cgroup, while inspecting the pmap of that process saw the big rss usage:
Linux# sudo pmap -p <pid of the test binary> | grep nvidia-uvm 0000000204600000 2048K rw-s- /dev/nvidia-uvm 00007fc060000000 4194304K rw-s- /dev/nvidia-uvm Linux# cat /sys/fs/cgroup/memory/myGroup/memory.usage_in_bytes 84951040
Therefore, I would make an assumption that oversubscribed GPU memory by cuda unified memory will not be constraints by the Linux cgroup (we are using v1 now). Please let me know if I missed anything, or there is an effort tracking and enhancing this already.
From production impact perspective, if the process is running on kubernetes and failed to be constrained by linux cgroup from memory perspective, then it will lead to system OOM, which could be bigger issue on production offline job running.