How to access gpu memory between processes

i"m using DGXA100 and I made two processes (P0, P1).

My work step is like below.

  1. Make two parameters (par0 on GPU0, par1 on GPU1) In P0.
  2. Write custom data on par0.
  3. CudaMemcpy(par0 → par1)
  4. Read par1 in P1 to use custom data.

In this work, i have some idea like below but it make errors.

  1. Using shared memory : P0 send pointer of par1 to P1. and P1 read par1 with the pointer ===> I think this will make an error : “segmentation fault(core dumped)”
  2. Using mmap : how can i use this??
  3. MPS : is this correct for my case?

Please let me know any solution or example…

This is explained in the CUDA programming guide. CUDA C++ Programming Guide

To share device memory pointers and events across processes, an application must use the Inter Process Communication API, which is described in detail in the reference manual. The IPC API is only supported for 64-bit processes on Linux and for devices of compute capability 2.0 and higher. Note that the IPC API is not supported for cudaMallocManaged allocations.
Using this API, an application can get the IPC handle for a given device memory pointer using cudaIpcGetMemHandle(), pass it to another process using standard IPC mechanisms (for example, interprocess shared memory or files), and use cudaIpcOpenMemHandle() to retrieve a device pointer from the IPC handle that is a valid pointer within this other process. Event handles can be shared using similar entry points.

1 Like

Thanks for your advice.

do you know how to find any example which is implemented using two processes?

The cuda samples include an IPC sample.

thanks for the link. i’m trying to compile the simpleIPC example. but there is some error.

HP-ZCentral-4R-Workstation:~/Desktop/User/docker/cuda-samples-master/Samples/0_Introduction/simpleIPC$ nvcc simpleIPC.cu

/bin/ld: /tmp/tmpxft_00001584_00000000-11_simpleIPC.o: in function childProcess(int)': tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0x18b): undefined reference to sharedMemoryOpen(char const*, unsigned long, sharedMemoryInfo_st*)’
/bin/ld: /tmp/tmpxft_00001584_00000000-11_simpleIPC.o: in function parentProcess(char*)': tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0xa2a): undefined reference to sharedMemoryCreate(char const*, unsigned long, sharedMemoryInfo_st*)’
/bin/ld: tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0xfd5): undefined reference to spawnProcess(int*, char const*, char* const*)' /bin/ld: tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0x1068): undefined reference to waitProcess(int*)’
/bin/ld: tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0x11db): undefined reference to `sharedMemoryClose(sharedMemoryInfo_st*)’
collect2: error: ld returned 1 exit status

do you know why i get this error ??

Thank you.

The sample consists of multiple files. Use the provided Makefile to compile the project.

Thank you for helping me.

i did execute file(simpleIPC) with the command “make”

and i do not have any idea to test this. can you check below?

lignex1-HP-ZCentral-4R-Workstation:~/Desktop/User/docker/cuda-samples/Samples/0_Introduction/simpleIPC$ ./simpleIPC
Process 0: Starting on device 0…
Step 0 done
Process 0: verifying…
Process 0 complete!

lignex1@lignex1-HP-ZCentral-4R-Workstation:~/Desktop/User/docker/cuda-samples/Samples/0_Introduction/simpleIPC$ ./simpleIPC 0
Process 0: Starting on device 0…
CUDA error at simpleIPC.cu:123 code=400(cudaErrorInvalidResourceHandle) “cudaIpcOpenMemHandle(&ptr, *(cudaIpcMemHandle_t *)&shm->memHandle[i], cudaIpcMemLazyEnablePeerAccess)”

when i used command “./simpleIPC” it does not make error.
but when i used “./simpleIPC 0” it makes error

I cannot help you with the program.

the program itself is not intended to be launched by the user with a command-line parameter. It launches a separate process and passes a command line parameter to that separate process when it does so.

See here.

1 Like

Thank you @striker159 , @Robert_Crovella .

Do you know if my work is possible with the example simpleIPC ?

  1. Make parameter0(30~40GB) on GPU0 In Process0.
  2. Send pointer of parameter0 to Process1 from Process1.
  3. Read parameter0 in Process1 via received pointer.

I want to share large data on GPU memory between processes or containers(docker).

I can not find code where memory malloc on GPU and where i can send pointer in simpleIPC. I think 270 line “checkCudaErrors(cudaMalloc(&ptr, DATA_SIZE));” here i am looking for…

If your are able to exchange data between the processes, it should work.
The sample code uses linux shared memory to communicate the IPC handles.
Line 270 allocates the buffer. (It is the only cudaMalloc call in the code)