Hello, I have a question about page migration when using Unified Virtual Memory.
If the same page is accessed continuously or with a short time gap, page granularity migration via TLB will be effective.
However, for non-reuse pages (i.e. streaming pages), I think it would be more effective to stream them to SM rather than perform migration.
I was wondering if CUDA has a page management method similar to my idea.
Also, migrating in 4KB (page size) increments makes sense if we can fully utilize data locality, but I don’t think it would be effective otherwise.
If you only need 4B or so of data on a single page, the rest of the data will be thrown away.
Of course, I know that when data is transferred over an interconnect like PCIe, it will move in 4KB chunks, but I’m still wondering if there are methodologies to address this.
Managed memory copies pages around over PCIe, if used by GPU or CPU. The driver manages, when it has to be copied again or copied back (e.g. after writes from GPU or CPU or if the GPU memory is full).
Zero-copy memory stays on the CPU RAM, but the GPU can access it over PCIe even in small amounts (e.g. a few bytes) without having to copy full pages. Typically it is faster than managed memory, except if you access the same pages repeatedly.