I’m looking to do some read-only compute on data in a mmap()'d file on Linux. It seems like I can cudaHostRegister() this memory and use zero copy to access it on device. As I understand it, zero copy uses no device caching. If I call a kernel twice with the same memory range using zero-copy memory, it will be paged in each time and performance will suffer. (Is this understanding wrong?)
As a result, I’d like to use Unified Memory so that the CUDA driver can handle the caching itself. I understand this to be one of the major distinctions between Unified Memory and simple Unified Virtual Addressing.
Is there a way to register the mmap() pointer to the Unified Memory management system? If not, is there a better solution than to skip UVA / UM and build some host->device caching scaffolding in the client host code?