you don’t need cudaEnablePeerAccess anymore, leaving cudaMalloc calls fast and paying the cost of the additional mappings only where needed
Then, if I want to share some memory between devices using virtual memory management mechanism, do I still need to check p2p access ability before I use it?
peer access is automatically enabled if you handle it via the function referenced in the blog:
" The cuMemSetAccess function allows you to target specific allocations to peer map to a specific set of devices."
If you have done that correctly, and the VMM api calls have not returned an error, then peer mappings as specified like that exist. There is no need to check for it. There is also no need to separately enable it either, as indicated in the blog.
no, because you specify the peer mappings you want, specifically, in the cuMemSetAccess function. Since you have not used cudaEnablePeerAccess(), other calls to cudaMalloc are unaffected (and they are not peer-mapped).
p2p access is a hardware future, and we have two ways of using it:
cudaEnablePeerAccess, works for cudaMalloc allocated memory, and will map all previous and future cudaMalloc to peer’s memory space, therefore it will slow down future cudaMalloc.
cuMemSetAccess , works for cuMemCreate allocated memory with virtual memory management, only maps specific memory to peer’s memory, and will not affect future memory allocation.
I would point out that in the cudaEnablePeerAccess case, the subsequent cudaMalloc operations that are “slowed down” are the ones pertaining to the peer devices. Not necessarily all future cudaMalloc operations (if there are other devices).