CUDA 4.1 RC1: "Peer-to-peer communication between processes"?

OK, I’ve been digging through the CUDA 4.1 RC1 release notes, documentation, and SDK, trying to find a description of a feature mentioned in the announcement email:

“Peer-to-peer communication between processes”

Anyone know where to find more information about this?

check out the cudaIpc interfaces. let me know if there are problems with it, as my team is responsible for it

Ah, I see. So the trick is to copy this cudaIpcMemHandle_t struct between processes, presumably through fork() or stuffing the struct memory bytes into a pipe and casting it back to the right type on the other side…

Any chance of getting an IPC example in the SDK before 4.1 final is released?

BTW, I should mention that I’m really excited about this feature because the code I’m currently working on spawns several CPU-only processes whose entire job is to produce input data for another process that does the CUDA stuff. (Due to the “legacy” nature of the CPU code, they have to run in separate processes, rather than threads of the GPU-controlling process.) Right now I have to serialize that data into a pipe, send it over to the GPU process, load it into the GPU, and then finally start the CUDA code. Multi-process access to the same GPU memory will streamline that enormously.

I do have one question: Is there any way to transfer ownership of a device allocation between processes? My processes operate in a pipeline fashion, so it would be most convenient to allocate memory in process A, send an IPC memory handle to process B, then free the device memory in process B when it is done using it.

I suspect that the API does not allow this, but I just wanted to check.

This is a very exciting feature for us.
Some text in the programming guide on how to use these features will be helpful.