From the whitepaper I can tell that in CC mode it’s still using UVM for memory management. I’m wondering if the page prefetching and oversubscription (two key features of UVM) are still in use when it comes to CC mode? If so, could you please briefly illustrate how it works? Thanks!
Yes, most of the UVM will operate similarly to the prefetch and oversubscription. UVM does the encryption/decryption paths transparently. Only cudaHostRegister() and other pinned host-memory usages have issues.
Using cudaMalloc based allocations will still try to respect affinity (e.g., cudaMallocHost())
Thanks so much!
It’s thrilling to hear that, but I do wonder how it works when it comes to these optimizations (cuz it involves enc/dec everytime and it may somehow slow down the process). Is there any open-source code on this part? Sorry I’m new to uvm :).
The open source parts of our driver are on this github: GitHub - NVIDIA/open-gpu-kernel-modules: NVIDIA Linux open GPU kernel module source
There are some slowdowns, as we currently have to utilize the CPU to perform the encrypt/decrypt operations (the GPU has dedicated hardware for these operations).
However, if you are using workloads that are more compute-bound rather than IO bound, then you can run at near theoretical performance on several workloads :)