I’m working on implementation of memory pool including host-memory and device-memory using Unified Memory.
Now, I’d like to move data between host-device explicitly for performance improvement. But, there is a problem in the following case:
(1) Allocate data of “X” using cudaMallocManaged()
(2) “X” is placed on host memory (by some execution)
(3) Write to “X” by device without read (write-only)
In this case, (3) causes page-fault and data-migration for “X”. So, I want to move page of “X” to device before (3).
However, if using cudaMemPrefechAsync(), not only page-migration but also HtoD data-copy is performed. In write-only case, this data-copy is unnecessary, and causes communication overhead.
Is it possible to only migrate pages without data copy ?