Hey guys,
I strongly believe that there is a way to implement device pointer sharing between host-side processes on Windows or on any other OSes and I’m wondering that there are no implementations / examples published yet.
I checked the Linux / Unix - based example and understood the fact that there are some Driver API calls wrapped into CUDA API calls that are used to implement this funcionality.
I found out that I can pin host memory to the shared memory (e.g. provided by Boost library) that is used by several host processes and use a standard way to transfer pinned data between host and device:
cudaHostRegister(hostPtr, hostPtrSize, cudaHostRegisterDefault);//previously allocate hostPtr using e.g. Boost
cudaMalloc((void **) &devicePtr, hostPtrSize);
cudaMemcpy(devicePtr, hostPtr, hostPtrSize, cudaMemcpyHostToDevice);
cudaHostUnregister(hostPtr);
As a result, I can store my data in a host shared memory and invoke data transfer calls (using same sharedMemory IPC) avoiding unnecessary host processes data transfer latency.
So, before to go and implement this mechanism I want to clarify any issues I can have with pinning host processes shared memory to the Driver managed memory (some page locks / page size limitations etc). Any ideas / proposals to check something?