CUDA IPC vs NVSHMEM for shared memory between applications


I want to run 2 applications which should share the same device memory mainly for image processing tasks. I looked at IPC and NVSHMEM but don’t find clear explanations which is better for single GPU usecases.

Are there other approaches to share device memory between applications other than these 2 options? I want to avoid copy to device to pass data between processes to save time used in memcpy operations. Are there any libraries that already support this like OpenCV ?

Thank you

NVSHMEM uses CUDA IPC under the hood, I believe. IPC is a “lower level” approach, NVSHMEM is a “higher level” approach. There are many capabilities that NVSHMEM provides that IPC doesn’t. However suggesting one is “better” than the other probably doesn’t make sense - it would depend on your needs.

If you need only simplistic sharing of buffers, IPC may offer a simpler approach. If you want more complex interactions including inter-process synchronization, you would have to “roll your own” with IPC (although probably not very difficult using CUDA IPC events), but NVSHMEM provides these capabilities “natively”. However NVSHMEM carries with it additional complexity e.g. around environment setup. NVSHMEM requires the use of a multi-process bootstrap application launcher, much like MPI (in fact, it can use MPI as its bootstrap launcher).

I’m not aware that OpenCV uses either of these. Does OpenCV natively support inter-process activity? If it doesn’t, then there would be no reason for it to use either of these.

Why do you need two processes? Could you also use 3 processes, the middle one running Cuda for the other two?