Cuda unified memory usage is not accounted by linux cgroup

System environment:
OS: Ubuntu 20.04
Linux kernel version: 5.15.0-46-generic
Nvidia Driver version: 515.86.01
CUDA version: 11.7

Did a simple test by using cuda unified memory to allocate GPU memory with over-subscription, and run that test binary under a linux cgroup with memory constraints of 500MB, like the following commands:

sudo apt install cgroup-tools
sudo cgcreate -g memory:myGroup
sudo su
echo 500M > /sys/fs/cgroup/memory/myGroup/memory.limit_in_bytes
sudo cgexec -g memory:myGroup ./cuda_unified_memory_test

After running this binary, we could see that even with more than 500MB being touched and consumed in this binary, it is not OOM’ed by the linux cgroup, while inspecting the pmap of that process saw the big rss usage:

Linux# sudo pmap -p <pid of the test binary> | grep nvidia-uvm
0000000204600000   2048K rw-s- /dev/nvidia-uvm
00007fc060000000 4194304K rw-s- /dev/nvidia-uvm
Linux# cat /sys/fs/cgroup/memory/myGroup/memory.usage_in_bytes

Therefore, I would make an assumption that oversubscribed GPU memory by cuda unified memory will not be constraints by the Linux cgroup (we are using v1 now). Please let me know if I missed anything, or there is an effort tracking and enhancing this already.

From production impact perspective, if the process is running on kubernetes and failed to be constrained by linux cgroup from memory perspective, then it will lead to system OOM, which could be bigger issue on production offline job running.