About cudaMallocManaged() granularity

Hello.

I have some questions about cudaMallocManaged() granularity. When I allocate unified memory by cudaMallocManaged() which size is under 128MiB, the free memory size is decreased 128MiB (ex. I allocated 16MiB Unified memory, but total global memory size is not 40GiB - 16Mib.) Why is this happening?

Below is my code flow.

GPU: NVIDIA A100 40GB
CUDA driver version: 525.85.12
CUDA version: 12.0

cudaMemgetinfo() // 1
cudaMallocManaged(arr, 16MiB)
init(arr)
cudaMemprefetchAsync(arr, 16MiB)
cudaMemgetinfo() // 2

2-1 is not 16MiB, it is 128MiB

I also have this problem with UVM. Waiting for an answer

There is apparently some allocation granularity in CUDA. The details of it are not published. Here are some examples of related questions/discussion.