CUDA Memory allocation size preference

In my code analysis, pytorch try to allocate GPU memory in various size but best fit of 2MB segments.

Is there any H/W or S/W reason using 2MB size ?

I have some fragment info.

  1. When I check CPU memory allocation parts, pytorch use “cudahostalloc” even if that is just for CPU.

  2. X86 support 4KB, 2MB and 4GB page. That is CPU preference size when allocate memory for CPU.

  3. When load data to GPU, CUDA driver need pinned memory location.