How can I pass data across two contexts cuMemcpyPeer across contexts

Hi I have an application which starts 2 processes, each with a context on an individual CUDA device. Later it wants to copy the data from one device to another device. I assume this is the purpose of cuMemcpyPeer(). The information of two CUcontexts data structures and two CUdeviceptrs are shared through IPC between two processes. When I try to do like this:
cuMemcpyPeer(dev_ptr[0], context[0], dev_ptr[1], context[1], size);
it executes with a successful return value. But the data copied across two contexts are actually wrong.
How am I supposed to do this?

Thanks!

I asked a similar question a while back, and got no reply.

The MemcpyPeer seems to only work outside of a parallel region. :(