I have a situation where I have large volumes of data to process on the GPU, however I need to be able to have different independent processes on the CPU have access to the data. I want to avoid each process having to transfer the same data to the GPU.
In a perfect world I would push the data to the GPU, process 1 would process with one set of kernels, then process 2 would process the data using different kernels on the same set of data. Much like what you can do with shared memory on a CPU where multiple processes can grab data from the shared memory space, but instead on the GPU.
Is this supported or even possible under the current CUDA framework?