Hello,
I have a question about the minimal granularity when trying to allocate GPU memory by cuMemcreate.
With my system, which is 3090 with CUDA 12,
I got the minimal allocation granularity of 2MB when I query with cuMemGetAllocationGranularity.
I wonder where this 2MB came from.
I suspect this might be the GPU huge-page size, but why isn’t the normal GPU page size such as 4KB?
Is there any way we can change this minimal allocation granularity?
Thxs
Hey, I’m also curious about the same. have you found out anything about it?
Hi @t-rprabhu
First of all, I couldn’t find any official comments on this question. However, I did come across some articles explaining this topic.
Inside the implementation of PyTorch, which is one of the state-of-the-art machine learning frameworks, there’s a component that utilizes CUDA Virtual Memory Management APIs.
This part of the code explains the CUDA VMM as follows:
“When we allocate a new segment, we allocate enough address space to map essentially the entire physical memory of the GPU (which is 256TiB of address space). However, we only map as much physical memory as is needed by the program at the moment. As more memory is requested, we add more physical memory to the segment. This can work at the granularity of GPU pages, which are currently 2MiB.”
I believe that 2MiB is the current page size for NVIDIA GPUs and manipulating memory at a granularity smaller than 2MiB is not possible at this time.
I hope this information is helpful.
source of PyTorch: pytorch/c10/cuda/CUDACachingAllocator.cpp at main · pytorch/pytorch · GitHub
Thanks.
@woosungkang Thanks for speedy and helpful reply!
Hi, I have a question about allocation, for example, we have data a, and b, both are much smaller than the page size (2MiB), will these data be allocated to one same page, or will they occupy two separate 2MiB pages?
Thanks.
a page, mapped for one allocation, cannot also be mapped for another allocation.