CUDA IPC vs NVSHMEM for shared memory between applications

ash.narayan13 · December 14, 2022, 11:06am

Hi,

I want to run 2 applications which should share the same device memory mainly for image processing tasks. I looked at IPC and NVSHMEM but don’t find clear explanations which is better for single GPU usecases.

Are there other approaches to share device memory between applications other than these 2 options? I want to avoid copy to device to pass data between processes to save time used in memcpy operations. Are there any libraries that already support this like OpenCV ?

Thank you

Robert_Crovella · December 14, 2022, 3:22pm

NVSHMEM uses CUDA IPC under the hood, I believe. IPC is a “lower level” approach, NVSHMEM is a “higher level” approach. There are many capabilities that NVSHMEM provides that IPC doesn’t. However suggesting one is “better” than the other probably doesn’t make sense - it would depend on your needs.

If you need only simplistic sharing of buffers, IPC may offer a simpler approach. If you want more complex interactions including inter-process synchronization, you would have to “roll your own” with IPC (although probably not very difficult using CUDA IPC events), but NVSHMEM provides these capabilities “natively”. However NVSHMEM carries with it additional complexity e.g. around environment setup. NVSHMEM requires the use of a multi-process bootstrap application launcher, much like MPI (in fact, it can use MPI as its bootstrap launcher).

I’m not aware that OpenCV uses either of these. Does OpenCV natively support inter-process activity? If it doesn’t, then there would be no reason for it to use either of these.

Curefab · December 19, 2022, 1:34pm

Why do you need two processes? Could you also use 3 processes, the middle one running Cuda for the other two?

ash.narayan13 · February 6, 2023, 4:20pm

But in either case, how do I shared the memory address between 2 application?

I have app1 which creates the GPU memory and writes an image to it.

App2 is supposed to read this image from the GPU and perform resizing f.e.

I am assuming that if I just pass the device memory address to the other application, this will not exist because the context is different. Is there a way to maintain the context between the 2 applications? I couldn’t find much CUDA documentation for this.

Robert_Crovella · February 6, 2023, 4:30pm

There is a CUDA sample code that demonstrates how. There are also various questions on forums that cover the topics. You can find those with a bit of searching. Here is one. Here is another.

system · March 15, 2023, 10:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPU Inter-Process Communications(IPC) question CUDA Programming and Performance	13	15321	January 4, 2023
How to share the same Device Memory between 2 process CUDA Programming and Performance	12	7444	October 28, 2009
How to share a device pointer between two processes in windows 10 CUDA Programming and Performance	2	2120	May 24, 2018
Is it possible for a unified virtual address (UVA) to be shared by difference processes or difference gpus? CUDA Programming and Performance	4	920	May 19, 2022
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4030	March 7, 2012
How to share CUDA memory between two processes? CUDA Programming and Performance	3	2938	July 9, 2018
IDEA: Intrinsic multi-GPU support (Even over a network) CUDA Programming and Performance	7	9594	January 1, 2009
How to access gpu memory between processes CUDA Programming and Performance	10	2672	August 4, 2023
Is it possible to share a Cuda Array between processes? CUDA Programming and Performance	4	502	July 14, 2021
CUDA shared memory CNN (convolutional neural network) CUDA Programming and Performance	7	2415	July 21, 2017

CUDA IPC vs NVSHMEM for shared memory between applications

Related topics