Page streaming with UVM system

Hello, I have a question about page migration when using Unified Virtual Memory.

If the same page is accessed continuously or with a short time gap, page granularity migration via TLB will be effective.
However, for non-reuse pages (i.e. streaming pages), I think it would be more effective to stream them to SM rather than perform migration.

I was wondering if CUDA has a page management method similar to my idea.

Also, migrating in 4KB (page size) increments makes sense if we can fully utilize data locality, but I don’t think it would be effective otherwise.
If you only need 4B or so of data on a single page, the rest of the data will be thrown away.
Of course, I know that when data is transferred over an interconnect like PCIe, it will move in 4KB chunks, but I’m still wondering if there are methodologies to address this.

The references I used are the following links.
maximizing-unified-memory-performance-cuda

It is possible to use zero-copy accesses, where the pages are not migrated, but streamed.

Thank you for the fast reply @Curefab !!

Then, you mean, the accessed page(4KB) is used only once and not used bit fields are thrown away.
Did I understand correctly?

Zero-copy memory is different from managed memory:

Managed memory copies pages around over PCIe, if used by GPU or CPU. The driver manages, when it has to be copied again or copied back (e.g. after writes from GPU or CPU or if the GPU memory is full).
Zero-copy memory stays on the CPU RAM, but the GPU can access it over PCIe even in small amounts (e.g. a few bytes) without having to copy full pages. Typically it is faster than managed memory, except if you access the same pages repeatedly.

I appreciate your kind reply @Curefab .

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.