multiple independent CPU processes using data that is in Device Memory

I have a situation where I have large volumes of data to process on the GPU, however I need to be able to have different independent processes on the CPU have access to the data. I want to avoid each process having to transfer the same data to the GPU.

In a perfect world I would push the data to the GPU, process 1 would process with one set of kernels, then process 2 would process the data using different kernels on the same set of data. Much like what you can do with shared memory on a CPU where multiple processes can grab data from the shared memory space, but instead on the GPU.

Is this supported or even possible under the current CUDA framework?

Separate processes definitely cannot share data in device memory. With CUDA 4.0, several host threads in the same process can share a device context. Although I have not tried to use the API in this way, it sounds like it could be done with multiple threads.