I have noticed that cudaMemGetInfo() does not provide helpful information when trying to analyze memory consumption on TK1:
the reported free memory does not go down if memory gets allocated with any of the functions cudaMalloc, cudaMallocHost, cudaMallocManaged. The reported free memory does go down when memory gets allocated with plain malloc.
I noticed that I can track CUDA memory allocations via parsing /proc/self/maps and counting allocations that map /dev/nvmap. Using that method I discovered that all CUDA allocations are at least 1 MiB in size. I could not find this behavior documented, I did expect allocations to be aligned to the 4 KiB page size, not 256 pages.
- Is this behavior configurable?
- Is there a technical reason for the specific value of 1 MiB?
I also checked that two consecutive allocations of 256 KiB result in two different allocation of 1 MiB each, the unused memory of the first allocation does not get used for second call to (e.g.) cudaMalloc.