How to access gpu memory between processes

aeiou963 · July 12, 2023, 10:58am

i"m using DGXA100 and I made two processes (P0, P1).

My work step is like below.

Make two parameters (par0 on GPU0, par1 on GPU1) In P0.
Write custom data on par0.
CudaMemcpy(par0 → par1)
Read par1 in P1 to use custom data.

In this work, i have some idea like below but it make errors.

Using shared memory : P0 send pointer of par1 to P1. and P1 read par1 with the pointer ===> I think this will make an error : “segmentation fault(core dumped)”
Using mmap : how can i use this??
MPS : is this correct for my case?

Please let me know any solution or example…

striker159 · July 12, 2023, 12:37pm

This is explained in the CUDA programming guide. CUDA C++ Programming Guide

To share device memory pointers and events across processes, an application must use the Inter Process Communication API, which is described in detail in the reference manual. The IPC API is only supported for 64-bit processes on Linux and for devices of compute capability 2.0 and higher. Note that the IPC API is not supported for cudaMallocManaged allocations.
Using this API, an application can get the IPC handle for a given device memory pointer using cudaIpcGetMemHandle(), pass it to another process using standard IPC mechanisms (for example, interprocess shared memory or files), and use cudaIpcOpenMemHandle() to retrieve a device pointer from the IPC handle that is a valid pointer within this other process. Event handles can be shared using similar entry points.

aeiou963 · July 13, 2023, 8:10am

Thanks for your advice.

do you know how to find any example which is implemented using two processes?

striker159 · July 13, 2023, 8:17am

The cuda samples include an IPC sample.

github.com

NVIDIA/cuda-samples/blob/master/Samples/0_Introduction/simpleIPC/simpleIPC.cu

/* Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *  * Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *  * Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *  * Neither the name of NVIDIA CORPORATION nor the names of its
 *    contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,

This file has been truncated. show original

aeiou963 · July 23, 2023, 8:00am

thanks for the link. i’m trying to compile the simpleIPC example. but there is some error.

HP-ZCentral-4R-Workstation:~/Desktop/User/docker/cuda-samples-master/Samples/0_Introduction/simpleIPC$ nvcc simpleIPC.cu

/bin/ld: /tmp/tmpxft_00001584_00000000-11_simpleIPC.o: in function childProcess(int)': tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0x18b): undefined reference to sharedMemoryOpen(char const*, unsigned long, sharedMemoryInfo_st*)’
/bin/ld: /tmp/tmpxft_00001584_00000000-11_simpleIPC.o: in function parentProcess(char*)': tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0xa2a): undefined reference to sharedMemoryCreate(char const*, unsigned long, sharedMemoryInfo_st*)’
/bin/ld: tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0xfd5): undefined reference to spawnProcess(int*, char const*, char* const*)' /bin/ld: tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0x1068): undefined reference to waitProcess(int*)’
/bin/ld: tmpxft_00001584_00000000-6_simpleIPC.cudafe1.cpp:(.text+0x11db): undefined reference to `sharedMemoryClose(sharedMemoryInfo_st*)’
collect2: error: ld returned 1 exit status

do you know why i get this error ??

Thank you.

striker159 · July 23, 2023, 9:24am

The sample consists of multiple files. Use the provided Makefile to compile the project.

aeiou963 · July 23, 2023, 10:01am

Thank you for helping me.

i did execute file(simpleIPC) with the command “make”

and i do not have any idea to test this. can you check below?

lignex1-HP-ZCentral-4R-Workstation:~/Desktop/User/docker/cuda-samples/Samples/0_Introduction/simpleIPC$ ./simpleIPC
Process 0: Starting on device 0…
Step 0 done
Process 0: verifying…
Process 0 complete!

lignex1@lignex1-HP-ZCentral-4R-Workstation:~/Desktop/User/docker/cuda-samples/Samples/0_Introduction/simpleIPC$ ./simpleIPC 0
Process 0: Starting on device 0…
CUDA error at simpleIPC.cu:123 code=400(cudaErrorInvalidResourceHandle) “cudaIpcOpenMemHandle(&ptr, *(cudaIpcMemHandle_t *)&shm->memHandle[i], cudaIpcMemLazyEnablePeerAccess)”

when i used command “./simpleIPC” it does not make error.
but when i used “./simpleIPC 0” it makes error

striker159 · July 23, 2023, 10:40am

I cannot help you with the program.

Robert_Crovella · July 23, 2023, 8:12pm

the program itself is not intended to be launched by the user with a command-line parameter. It launches a separate process and passes a command line parameter to that separate process when it does so.

See here.

aeiou963 · August 4, 2023, 2:21am

Thank you @striker159 , @Robert_Crovella .

Do you know if my work is possible with the example simpleIPC ?

Make parameter0(30~40GB) on GPU0 In Process0.
Send pointer of parameter0 to Process1 from Process1.
Read parameter0 in Process1 via received pointer.

I want to share large data on GPU memory between processes or containers(docker).

I can not find code where memory malloc on GPU and where i can send pointer in simpleIPC. I think 270 line “checkCudaErrors(cudaMalloc(&ptr, DATA_SIZE));” here i am looking for…

striker159 · August 4, 2023, 6:31am

If your are able to exchange data between the processes, it should work.
The sample code uses linux shared memory to communicate the IPC handles.
Line 270 allocates the buffer. (It is the only cudaMalloc call in the code)

Topic		Replies	Views
GPU Inter-Process Communications(IPC) question CUDA Programming and Performance	13	14945	January 4, 2023
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4003	March 7, 2012
cudaIpcGetMemHandle with mapped/pinned memory CUDA Programming and Performance	8	4357	July 10, 2018
How to improve the performance of using CUDA IPC shared memory? CUDA Programming and Performance cuda	5	95	October 23, 2024
CUDA 4.1 RC1: "Peer-to-peer communication between processes"? CUDA Programming and Performance	4	7988	November 9, 2011
How to share the same Device Memory between 2 process CUDA Programming and Performance	12	7386	October 28, 2009
How to share a device pointer between two processes in windows 10 CUDA Programming and Performance	2	2076	May 24, 2018
A little help with Multi-GPU example please :) How do I pass data to each GPU? CUDA Programming and Performance	8	28000	March 4, 2012
How to access GPGPU memory area from two pods CUDA Programming and Performance	5	56	December 12, 2024
Problem with IPC CUDA Programming and Performance	10	3365	May 27, 2020

How to access gpu memory between processes

Related topics