how to use portable pinned memory for multiple gpu

Hi, anybody knows how to use portable pinned memory for multiple gpu?

in the cuda SDK, there is a sample called “simpleMultiGPU”

what it does for multiple gpu is basically pack all data into one structure, create one cpu thread for one gpu then send the structure to each gpu and call the kernel.

Part of the code is:

for(i = 0; i < GPU_N; i++)
threadID[i] = cutStartThread((CUT_THREADROUTINE)solverThread, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);

However, for communication between gpus, they can only copy data back to cpu and then synchronize.

I heard that for portable pinned memory, it’s available for multiple gpus, each gpu can directly read and write to that memory. But anybody can show me a small sample how to do that in multiple gpus case?

Really appreciate your help!

Nobody ever done this before?