How to share a device pointer between two processes in windows 10

mayejudi · May 24, 2018, 3:54pm

I’m writing 2 programs in Windows 10 to do something like below:

app1: Take an opencv gpuMat do some compute and save the huge date into memory

app2: read from the memory and render it.

I’m now using the host memory to transmit the huge amount of data and it is too slow.( app1 device memory → app1 host memory → app2 host memory → app2 device memory )

What I want is to know if there is a way to pass the device pointer from app1 to app2.

I know there is a way to doing it in Linux with the Inter-process communication (IPC). I am exploring the idea of cuda contexts. I found this question asked in 2009 https://devtalk.nvidia.com/default/topic/418234/?comment=2920332#reply
However I know at that time contexts were related to threads. That changed later to include multi-threding and now they are related per device per processes. https://devtalk.nvidia.com/default/topic/519087/cuda-context-and-threading/?offset=4
I think each app in my case has its own context, being each one a process itself. However I also found that it is not possible to pass pointers between contexts. However reading the driver API https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX, there are functions to pop and push contexts, which gave me the idea of a process driven by app1 being able to pop its cuda context and somehow the process driven by app2 can push it in its stack and in that way I can retrive my data pointer created in app1 and use it in app2.
Is this possible? If so, how?

// App1: has data stream send by MPI with a cpu memory to an image h_inIm, 
// I also have access to ncols and nrows
// there are other ways to create the gpuMat, for example: creating a mat header for h_inIm
// and then using cv::Mat::upload to create inImg. However I want the cuda context so I use the 
// cuda runtime calls to make sure I have created a context


CUresult a;
CUcontext pctx;
cudaSetDevice(0); // runtime API creates context here

unsigned int arraySz = nrows * ncols*sizeof(float);
const size_t step    = ncols * sizeof(float);
float * d_inIm;
cudaMalloc((void **) &d_inIm, arraySz);
cudaMemcpy(d_inIm, h_inIm, arraySz, cudaMemcpyHostToDevice);    
cv::cuda::GpuMat inImg = cv::cuda::GpuMat(nrows, ncols, CV_32F, static_cast<char*>(*d_inIm),step);
cv::cuda::GpuMat outImg(nrows, ncols, CV_32F);
//... do some operation with opencv in the gpu so, d_inIm becomes d_outIm
someFunc(inImg,outImg);

// Now I want to pass a pointer to outImg, this is outImg.data in a stream through MPI to App2
// I do not want to go to the host to pass the data
//My problem is that as soon as app1 is finished outImg's pointer will get out of scope 
//and I am not able to pass it. I think here I can do something like:

a = cuCtxGetCurrent(&pctx);
assert(a == CUDA_SUCCESS);
a = cuCtxPopCurrent ( &pctx ); 

//can I send pctx through a stream using MPI ? If so how app2 will acquire it? 
// My intuition tells me that it will get destroyed when app1 finished.
//Or can I use a = cuCtxPopCurrent ( &pctx ); in app1 as my last call?
// then inside app2 has a call to a = cuCtxPushCurrent ( &pctx ); ?

I would really appreciate any help you can provide.

cbuchner1 · May 24, 2018, 4:11pm

Different processes occupy separate memory spaces - the address space is local to each process. This concept also applies to CUDA. It’s a fundamental security feature of modern multitasking computer operating systems.

You can share a CUDA context among the threads belonging to the same process, but this is impossible accross process boundaries. Anything else would create security problems that would make Meltdown and Spectre look harmless in comparison ;)

Christian

njuffa · May 24, 2018, 6:45pm

Cross link:

[url]https://stackoverflow.com/questions/50513983/share-a-gpu-pointer-between-processes-in-windows-10-cuda-9[/url]

Topic		Replies	Views
How to share the same Device Memory between 2 process CUDA Programming and Performance	12	7446	October 28, 2009
Sharing device pointers between different threads on the same GPU CUDA Programming and Performance	7	3328	August 16, 2009
Interprocessor sharing of device memory Legacy PGI Compilers	2	2902	December 3, 2015
How to access gpu memory between processes CUDA Programming and Performance	10	2675	August 4, 2023
global cuda memory and os-threads CUDA Programming and Performance	13	12321	January 21, 2009
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5446	June 26, 2007
How to share device pointer across CUDA context? CUDA Programming and Performance	4	166	October 8, 2024
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4030	March 7, 2012
CUDA device pointer host-side processes sharing implementation CUDA Programming and Performance	0	667	June 7, 2016
Memory copy between two CUDA contexts CUDA Programming and Performance	1	1614	March 16, 2009

How to share a device pointer between two processes in windows 10

Related topics