CUDA IPC vs NVSHMEM for shared memory between applications


I want to run 2 applications which should share the same device memory mainly for image processing tasks. I looked at IPC and NVSHMEM but don’t find clear explanations which is better for single GPU usecases.

Are there other approaches to share device memory between applications other than these 2 options? I want to avoid copy to device to pass data between processes to save time used in memcpy operations. Are there any libraries that already support this like OpenCV ?

Thank you

NVSHMEM uses CUDA IPC under the hood, I believe. IPC is a “lower level” approach, NVSHMEM is a “higher level” approach. There are many capabilities that NVSHMEM provides that IPC doesn’t. However suggesting one is “better” than the other probably doesn’t make sense - it would depend on your needs.

If you need only simplistic sharing of buffers, IPC may offer a simpler approach. If you want more complex interactions including inter-process synchronization, you would have to “roll your own” with IPC (although probably not very difficult using CUDA IPC events), but NVSHMEM provides these capabilities “natively”. However NVSHMEM carries with it additional complexity e.g. around environment setup. NVSHMEM requires the use of a multi-process bootstrap application launcher, much like MPI (in fact, it can use MPI as its bootstrap launcher).

I’m not aware that OpenCV uses either of these. Does OpenCV natively support inter-process activity? If it doesn’t, then there would be no reason for it to use either of these.

Why do you need two processes? Could you also use 3 processes, the middle one running Cuda for the other two?

But in either case, how do I shared the memory address between 2 application?

I have app1 which creates the GPU memory and writes an image to it.

App2 is supposed to read this image from the GPU and perform resizing f.e.

I am assuming that if I just pass the device memory address to the other application, this will not exist because the context is different. Is there a way to maintain the context between the 2 applications? I couldn’t find much CUDA documentation for this.

There is a CUDA sample code that demonstrates how. There are also various questions on forums that cover the topics. You can find those with a bit of searching. Here is one. Here is another.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.