Hello.
I have some questions about cudaMallocManaged() granularity. When I allocate unified memory by cudaMallocManaged() which size is under 128MiB, the free memory size is decreased 128MiB (ex. I allocated 16MiB Unified memory, but total global memory size is not 40GiB - 16Mib.) Why is this happening?
Below is my code flow.
GPU: NVIDIA A100 40GB
CUDA driver version: 525.85.12
CUDA version: 12.0
cudaMemgetinfo() // 1
cudaMallocManaged(arr, 16MiB)
init(arr)
cudaMemprefetchAsync(arr, 16MiB)
cudaMemgetinfo() // 2
2-1 is not 16MiB, it is 128MiB