We recently want to use UVA in multi-processing training situation. However, is it possible for a unified virtual addressing to be shared by different process or gpus?
We now can have multi-processing training with each process have a copy of this space in CPU. I wonder how can I make it shareable? Thank you!
You can use CUDA IPC. There are sample codes.
Yes, it is clear that CUDA IPC can be used to share device memory pointers between different GPUs. But from my perspective, UVA is not quite a thing. It provides a single virtual memory address for all the memory in the system, and we can access these pointers from GPU code no matter where they reside. Do you think this kind of address can be shareable between different processes?
I think if you use CUDA IPC, you can share access to device memory amongst all the GPUs in the system, more or less regardless of which process they are in. I make no statements beyond that regarding UVA or IPC. I’m not understanding whatever distinction you are making.
In general, an address from one process is not usable in another process, even if the address spaces are harmonized, because processes in modern operating systems don’t work that way. Process isolation is a thing. And using an address in a process that you haven’t properly set up simply won’t work. I think that statement is pretty much true for host code for the OS’s I am familiar with, and it is the same for device code.
You can’t take a host memory address from one process, and use that numerical value in another process, without any support or preparation. You need something like an IPC mechanism to make that work. Similar statements apply to device memory addresses.
If those GPU codes reside in different processes, and you are not also talking about using CUDA IPC, your statement is false.
UVA, all by itself, isn’t particularly useful or interesting. It is interesting as an enabler for other interesting things, such as being able to introspect pointers and being able to share the same pointer (numerical value) between host code and device code, for memory regions that are accessible by both, such as managed memory or pinned memory.
Yes, I mean the access is possible under the same process. Thank you for pointing out my mistake. I will try using CUDA IPC to achieve my purpose.